[Autoscaler] Sharded Autoscaler Ray cluster

valiantljk · October 4, 2022, 6:51pm

Hi,
My ray application requires that certain tasks are assigned to a specific group of nodes. For example,
task1 assigned to group1: node1,2,3
task2 assigned to group2: node4,5,6,7,8
task3 assigned to group3: node9
In case of node failure, e.g., node 1, failed, I’m thinking of relying on ray autoscaler to add a new node to the cluster, and then inject this node resource to the group1.

Is there any easy way to achieve this?

Best,
Jialin

Alex · October 4, 2022, 8:38pm

Yep you can do this with custom resources

available_node_types
  group1:
    node_config: ...
    resources: {"group1": 1}
  group2:
    node_config: ...
    resources: {"group2": 1}

then in your code, you can ensure the there are 3 nodes in group 1, 4 in node 2, etc by doing

autoscaler.sdk.request_resources(bundles=[{"group1": 3}, {"group2": 4}])

or just set min_workers on those node groups to ensure that you have x of the nodes.

To schedule tasks on the nodes, include a tiny amount to force the task to run on a node from that group

@ray.remote(num_cpus=1, resources={"group1": 0.001})
def foo():
 pass

valiantljk · October 4, 2022, 10:56pm

Can node group be created dynamically during runtime?

pdames · October 4, 2022, 11:25pm

To be more concrete about the underlying problem here, we would ideally like to:

Launch Ray cluster A to run tasks that collect stats for N table partitions.
Based on these stats, create or modify node groups in Ray Cluster A to assign min/max node group sizes for upcoming per-partition transforms that will be assigned to each node group.
Run a transform for each partition on ray cluster A, where each transform uses nodes from only 1 group.

Alex · October 5, 2022, 6:01pm

I see, I’m assuming the nodes in your cluster all homogenous (or at least you don’t care which node group a particular version of a node belongs to). This doesn’t give you a strong guarantee, but you may consider using placement groups as your node groups with a PACK policy, though there are caveats to this (you may want all tasks to be num_cpus=1, etc).

You can also dynamically modify the autoscaler config, but that’s probably not what you want to be doing.

I think what you probably really want is to un-deprecate dyanmic resources. ray/dynamic_resources.py at master · ray-project/ray · GitHub

pdames · October 5, 2022, 7:06pm

Right - a dynamic-resource-based solution is actually where we started our discussion, and resurrecting it may be the best way forward here.

Placement groups could work, but seem like they make us concede to (1) having a static-sized node group per partition (we’d prefer autoscaling node groups) and (2) losing dynamic specification of the memory requirements of each task at execution time. This is based on the assumptions that (1) placement group creation always triggers on-demand autoscaling, and (2) dynamic task resource requirements cannot be specified together with a placement group (e.g. some_task.options(scheduling_strategy=PlacementGroupSchedulingStrategy(...), resources={...}).remote()).

Specifying custom resources during ray start also seems theoretically possible, but the process to determine the resource to attach to a node being started may be complex, since we’d need to introspect the current state of the cluster first to decide which shard the node should be placed in.

Alex · October 5, 2022, 8:48pm

(1) having a static-sized node group per partition (we’d prefer autoscaling node groups)

Agreed. There has been some talk of autoscaling placement groups cc @sangcho

(2) losing dynamic specification of the memory requirements of each task at execution time.

You should still be able to include memory requirements/other resources that you don’t pre-allocate in the placement group.

valiantljk · October 12, 2022, 10:40pm

@Alex @sangcho After launching a cluster with multiple node groups, is there any API to query the node group? e.g.,

how many node groups are available
status of a specific node group, e.g., number of nodes
node group config, e.g., min and max
whether node groups has been used/assigned task or not

sangcho · October 13, 2022, 11:26pm

Given you are using Alex’s implementation;

You can infer it using ray status. or ray.available_resources()
Same as 1.
This is a static config from the YAML file.
Same as 1

valiantljk · October 13, 2022, 11:50pm

Cool, is there any api for ray status?

valiantljk · October 14, 2022, 12:17am

Also, can I know the node id of each node group?

Topic		Replies	Views
Autoscale on custom private cloud Ray Clusters	1	380	December 25, 2021
Autoscaler doesn't scale workers on K8s	5	685	February 15, 2021
Autoscaling - Adding new worker nodes - stopped? Ray Clusters	0	350	July 15, 2021
Scaling down nodes with specific custom resources Ray Clusters	1	286	March 23, 2023
Can I use ray autoscaler to control a manually launched ray cluster Kubernetes	3	570	July 15, 2021

[Autoscaler] Sharded Autoscaler Ray cluster

Related topics