How to limit cpus used on each worker [Autoscaler]

jtm812 · February 9, 2021, 6:57pm

When I run locally on a single machine I can specify num_cpus on ray.init(num_cpus = 2). But i cant specify this on cluster

I want to limit the number of cpus being used per worker like this:

worker_node: 

             max_cpus: 3 

Head_node: 

              max_cpus: 4

Do you know how I would edit my yaml to do so, as what I have tried hasn’t worked?

The reason why is I run out of RAM on my ec2 when all cores are running on it.

cluster_name: autoscale



initial_workers: 5

min_workers: 5

max_workers: 5




initialization_commands:



    - aws configure set aws_access_key_id --------------------

    - aws configure set aws_secret_access_key -------------------

    - aws configure set default.region eu-west-2



    #access to docker login

    - eval $(aws ecr get-login --no-include-email --region eu-west-2)

   - sudo aws s3 cp s3:/ pipeline/ --recursive



docker:

    image: "/pipeline:ray"  

    container_name: "hello_ray_container"

    pull_before_run: True

    run_options:

        - "-v /home/ubuntu/pipeline/data:/opt/pipeline/data"



provider:

    type: aws

    region: eu-west-2



auth:

    ssh_user: ubuntu



head_node:

    InstanceType: c5.12xlarge

    ImageId: latest_dlami  # Default Ubuntu 16.04 AMI



    BlockDeviceMappings:

        - DeviceName: /dev/sda1

          Ebs:

              VolumeSize: 200



worker_nodes:

    InstanceType: c5.12xlarge

    ImageId: latest_dlami  # Default Ubuntu 16.04 AMI.



    BlockDeviceMappings:

        - DeviceName: /dev/sda1

          Ebs:

              VolumeSize: 200

Alex · February 9, 2021, 8:08pm

2 possibilities:

In theory, this is what the memory resource is for, so if you can estimate how much memory your tasks use, that’s ideal.
It’s sometimes difficult to estimate the memory usage of a task, so you can specify

head_node:
    resources: {"CPU": N}

to override ray’s CPU detection and manually set the number of CPUs that Ray will use on the machine.

jtm812 · February 10, 2021, 9:38am

Thanks for your response.

Unfortunately, I can’t get that to work. Is there a specific way to include it.

this is the error i get

raise ParamValidationError(report=report.generate_report())

botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: “resources”, must be one of: BlockDeviceMappings, ImageId, InstanceType, Ipv6AddressCount, Ipv6Addresses, KernelId, KeyName, MaxCount, MinCount, Monitoring, Placement, RamdiskId, SecurityGroupIds, SecurityGroups, SubnetId, UserData, AdditionalInfo, ClientToken, DisableApiTermination, DryRun, EbsOptimized, IamInstanceProfile, InstanceInitiatedShutdownBehavior, NetworkInterfaces, PrivateIpAddress, ElasticGpuSpecification, ElasticInferenceAccelerators, TagSpecifications, LaunchTemplate, InstanceMarketOptions, CreditSpecification, CpuOptions, CapacityReservationSpecification, HibernationOptions, LicenseSpecifications, MetadataOptions, EnclaveOptions

head_node:
    resources: {"CPU": 4}

    InstanceType: c5.12xlarge

    ImageId: latest_dlami  # Default Ubuntu 16.04 AMI



    BlockDeviceMappings:

        - DeviceName: /dev/sda1

          Ebs:

              VolumeSize: 200

jtm812 · February 10, 2021, 10:50am

solved it,

just added “–cpus=6” to docker input run options

Perhaps not the most elegant, but is simple.

Ameer_Haj_Ali · February 15, 2021, 6:47am

You should be able to add it under the available_node_types field with resources field.
See this example:

github.com

ray-project/ray/blob/master/python/ray/autoscaler/aws/example-multi-node-type.yaml

# Experimental: an example of configuring a mixed-node-type cluster.
cluster_name: multi_node_type
max_workers: 40

# The autoscaler will scale up the cluster faster with higher upscaling speed.
# E.g., if the task requires adding more nodes then autoscaler will gradually
# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
# This number should be > 0.
upscaling_speed: 1.0

# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a

# Tell the autoscaler the allowed node types and the resources they provide.
# The key is the name of the node type, which is just for debugging purposes.
# The node config specifies the launch config and physical instance type.
available_node_types:

This file has been truncated. show original

Topic		Replies	Views
Restricting number of actors on a given node Ray Core	7	480	February 21, 2021
Num_cpus as a parameter? Ray Core	7	448	December 17, 2021
[Clusters] [Core] Head node max_workers is not respected Ray Clusters	2	289	June 17, 2021
How to limit Ray CPU utilization? Ray Core	1	743	March 23, 2024
Scale up from 0 Ray Clusters	7	565	July 15, 2021

How to limit cpus used on each worker [Autoscaler]

Related topics