Access worker node environment variables in head node

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am using cluster.yaml file to create cluster. One of my requirement here is to access environment variable of worker node in head node.

Could you give an example of what you want to do?

Can you set the same environment variables before starting head and worker nodes?

So, I am using cluster.yaml file for cluster creation and all the servers I am using as nodes are on-premise.

Now, some of the servers are with cuda GPU, some are with i5 processor, and some are with i7 processor. So I am defining an environment variable for job capacity based on the kind of CPU or GPU, a node has.

Now, to define the environment variables at the node level, I have written a shell script in each worker node to start the worker node with defined variables, and I am calling this script in the worker node setup section in the cluster.yaml file.

Now, one of my requirements is, I want to read those environment variables in the head node, which are defined in the worker node.

So each worker is started with different env vars and you want to know the env vars for each worker from the head node? Like you want to get a map from worker node id to env vars? Is my understanding correct?

Exactly, I want the same.

Could you elaborate more on how these env vars will be used? I’m trying to see if you need env vars or Ray custom resources.

Actually, we have started with defining custom resources only, and it was working fine till the time. But now, for some new requirements, custom resources are restricting us to build generalised solution. With custom resources it’s possible, but it’s increasing complexity of our solution, which we don’t want. That’s why we come up with idea of using Environment variables.

So, there are two main things which defines number of jobs a node can run: 1) memory (RAM) 2) type of CPU/GPU.

Now based on our async actor implementation, our single detection actor can handle multiple jobs. So, for any node, manually we will test that how many instance of detection actor it can handle in memory and how many jobs per instance of detection actor it can handle based on computation power. Based on efficient results from testing, finally we will have four environment variables, which can define #instance & #jobs_per_instance per CPU & GPU.

@jjyao I hope I answered your question properly. Please let me know, if any further info I can provide.

Sorry for the late reply. Let’s say now you know how many instances of detection actor you can run per node, how are you going to launch different number of actors to different node?

I can find number of active instance of a actor based on list_actors() state api, where I can group actors based on node ip. Now, based on capacity I can check, if I can run new instance or not. While creating actor, I am including info in actor name itself, Ex. detactor_ip_address_GPU, which will help me for counting actor instances for CPU and GPU. Now to assign it to specific node, I am using custom resource “node:ip_address”.

I may not have the full picture but I still think we should use custom resources at least for how many instances of detection actor each node can run.

Also instead of using node ip to pin a task to node, you can use NodeAffinityScheduingStrategy (Scheduling — Ray 2.3.0).

If you want to get worker’s env vars, I think you can launch a task to each node and returns the env vars.

@shyampatel I hope @jjyao has provided you with sufficient guidance and answers. As pointed out, will the NodeAffinitySchedulingStrategy work for you?

@Jules_Damji I have never used NodeAffinitySchedulingStrategy, so will have to go through the functionality and how I can integrate in our pipeline flow. But still, my requirement will be still needed to access environment variables of worker node.

As @jjyao mentioned, one method is I can create a task and assigned it to the respective node, which can return values of environment variables. Still, I am looking for an easy way to do this, if possible.

@Jules_Damji one quick help required here. Can I assign specific actor to specific worker ip based on NodeAffinitySchedulingStrategy ?

Yea, you can assign a specific actor to a specific worker id (not ip) based on NodeAffinitySchedulingStrategy

@shyampatel Does that answer your question?

Yes, That’s what I want to know. I have started going through the functionalities of NodeAffinitySchedulingStrategy, it will take time for me to update complete flow of my pipeline.

But, still with this also, it’s not solving my complete requirement which I have mentioned in below answer:

Here, I have mentioned the use case of my environment variables:

Ray doesn’t support getting environment variables of worker nodes so you need to do your own thing. One possibility is launching a task to each worker node to collect that.

1 Like