How to force Tuner workers to use only the worker node

andrwang · July 5, 2023, 1:39am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Currently, I’m spinning up a Ray Tuning job that trains and tunes very large model with concurrency set to 3. An individual training of the model takes a huge amount of memory, so the worker nodes are allocated with 350G of memory each. The head node is only allocated with 8G memory and 8 CPU.

For some reason, Ray is scheduling tasks or actors that are using a lot of the memory of the head nodes, while a number of the worker nodes’ memory remains mostly unused.

I tried to force them to use worker nodes by setting cpu to 0, but that didn’t seem to work: the memory would still explode on the head node.

I saw Resources — Ray 2.5.1, which could definitely work for me, but I am not sure how to apply it to Ray Tuning. Where should I pass those specifications?

Topic		Replies	Views
Best way to config ray workers Ray Core	6	456	February 26, 2021
Memory management with non-exclusive node access RLlib	3	288	October 5, 2021
Ray Worker Max Memory Ray Core	3	521	February 5, 2021
Restricting number of actors on a given node Ray Core	7	492	February 21, 2021
Ray Tune jobs fails with no explicit reasons Ray Tune	12	617	April 12, 2023

How to force Tuner workers to use *only* the worker node

Related topics

How to force Tuner workers to use only the worker node