Access "portion" of resource assigned to task

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Given a custom resource in ray (or even a non-custom one, such as GPU), it is useful to know which “portion” of that resource on a node has been allocated to a given task. Without it, it is impossible to determine which part of a shared (custom) resource a task (worker?) should use. For example, let us say you have 4 tasks running on a given node, each requiring CUSTOM_RESOURCE:1, and the node has CUSTOM_RESOURCE:4. Lets say there is some custom resource “canvas” with 4 “panels”, we want each task to write to one of the panels, we never want two tasks to try to write to the same one of the panels (at the same time). So, we either need some kind of (within-node) synchronization or way of checking what other task workers are doing on the node (which will be a pain in the butt). Or, we could simply get e.g. the “ID_ON_NODE” (or even just “ASSIGNED_CUSTOM_RESOURCE”), which will be one of [0, 1, 2, 3] (if there are 4 tasks).

Does anything exist? When I do ray.runtime_context().get_assigned_resources() it just tells me HOW MUCH of each resource I have been assigned (but not THE IDENTITY of the resource). What I need is something like ray.runtime_context().get_accelerator_ids(), which returns properly the ID of the GPU if you are using GPU, but it does not work for custom resources, only GPUs!

Am I missing some function that will tell me the ID of the currently executing worker (on the node), or the ID of the resource “chunk” (portion) that has been assigned to me? For example, if it was CPUs, and the node has 40 CPUs, and I am a task using 2 CPUs, I would want to know that I am currently on CPU #4 and CPU#18. CPUs are a bad example, but you get the idea. It does not even matter if those map on to any physical thing (they would just be logical/abstract resources, i.e. I may not actually be executing on those CPUs on the actual processor, similar to how when I get a 20GB chunk of memory as a resource, and I am the 5th 20GB chunk, it does not mean that I am on the RAM from 100-120GB addresses).

You can declare your custom resource as a unit instance resource (GPU is a unit instance resource, so you can get ID of the GPU resource assigned to your task). You can do so by setting env var RAY_custom_unit_instance_resources=my_gpu_like_resource before starting Ray.

After that you can then get the ID of your custom resource using get_runtime_context().worker.core_worker.resource_ids()["my_gpu_like_resource"].

These are some private APIs you need to use to achieve this. Ideally we should support it better with public APIs, can you file a GH feature request for it and reference this question?

Oh wow, thank you jjyao. That is exactly what I need I think. So there is no public API for this?

I will test if it works, then file a GH feature request. It seems like an “obvious” thing (and indeed allows people to define their own GPU resources etc., when you have heterogenous GPUs on the system for example).

1 Like

I’m working on testing this and making a feature request. One related question: is it possible to specify DISJUNCTIONS of resources (i.e. X or Y or Z)?

I.e. if I have custom resources
gpu16 and gpu12 (GPUs with 16GB mem or 12GB mem), let’s say I know a task requires 8GB memory, then I can specify that each job requires [gpu12: 1 OR gpu16: 0.5].

Thank you for your help jjyao.

For posterity, I have created a github repo which creates ray clusters with appropriately defined custom resources per my needs (for heterogenous clusters, i.e. clusters with different types of GPUs with different amounts of memory each, even on the same node).

This at least lets people use current ray implementation to get around the limitations, I will also submit a feature request so that it is not necessary to hack custom resources and use the hidden/internal/private structures to get worker/task indices of assigned resource portions.

This is awesome and thank you here; can you link the GH feature request. I’ll make sure we work this into our plans at Anyscale eng.

@flyingfalling Unfortunately this is not possible but we are considering something like [Core][REP] GPU Memory awareness scheduling by jonathan-anyscale · Pull Request #47 · ray-project/enhancements · GitHub that should address your need.