Tune: Passing cluster configuration to ray


I’m using Tune with PyTorch for hyperparameter tuning in a single node. I’m trying to setup a few ray cluster parameters such as “address” to “localhost” then calling “tune.Tuner()” , which I suppose will set the “–node_ip_address=” option when cluster management processes run.

I’m new to ray and in doubt if “tune.Tuner()” calls “ray.init()” under the hood (I suppose it does). If so, how can I pass cluster parameters down to ray.init(). If not, how can I set which host address to be used when running “tune.Tuner()” ?

I’m using the ray/python/ray/tune/examples/mnist_pytorch.py as my starting point.

Thanks in advance


Tune does call ray.init() under the hood if you don’t initialize it yourself. You can add a ray.init() before calling Tuner.fit() to set some custom cluster parameters!

By the way, if you’re on Ray 2.4+, you should see this in your logs:

Does this log message answer your question?

1 Like

Hello @justinvyu

It does, however I don’t see that messages in logs. Btw, I’m using ray 2.4.0.

Anyway, I added a ray.init() call before turner.fit() in the mnist_pytorch.py script trying to make ray workers to use ‘localhost’ or IP ‘’ without success. Workers still use the IP from a particular workstation’s network card.

Logs are like:
[2023-04-28 13:38:26,178 I 62180 62180] core_worker.cc:215: Initializing worker at address: AA.BB.CC.DD:45357, worker ID e88f0d5b81566120ca6de321c89b289d5ef19743db7be3fdd9f74b8a, raylet 8817e95f545781e4d2079ff9017de876b28c90b709bd5c827a4bc86a

Where AA.BB.CC.DD is the workstation’s NIC IP.

So, is there a way to force a Ray Tune worker initialization to use localhost or address ?

Thanks again,

What are you trying to achieve? Do you want other machines to be able to connect to the machine? If so, you should start ray with ray start --head on the command line and the connect using e.g. ray.init("")

Hello @kai
I’ll better detail the problem I’m facing. I’m trying to use Ray Tune in a single Workstation (from NVIDIA), however the hyperparameter tuning does not start . As I said, I’m using the ray/python/ray/tune/examples/mnist_pytorch.py script as starting point. Here how I’m running it:

$ python mnist_pytorch.py
2023-05-01 11:12:47,096	INFO worker.py:1625 -- Started a local Ray instance.

And it does not progress from there. No other message in the terminal. I see a lot of ray::IDLE processes along with gcs_server, monitor.py, dashboard.py,raylet,log_monitor.py and agent.py running.
My environment is:

$ python -V
Python 3.8.14
$ python -c "import ray; print(f'Ray version {ray.__version__}')"
Ray version 2.4.0
$ python -c "import torch; print(f'PyTorch version {torch.__version__}')"
PyTorch version 1.12.1+cu113
$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release:	7.9
Codename:	Maipo

I suspect firewall rules are preventing processes to communicate. I’m not super-user on this workstation neither can add/remove rules to verify if that’s the case. So I’m trying to make all processes bind to the localhost/ and check if it starts the tuning (I’m positive there’s no rules on the loopback/ interface). All processes other than dashboard.py bind to the real workstation’s IP (from a network interface).

What would be the best approach to debug this problem ?

The mnist_pytorch.py example worked in two other systems I’m using (a MacOS laptop and another Linux workstation, where I am super-user and there’s no firewall rules).

Thanks in advance,

Hello @kai and @justinvyu

After deeper troubleshooting, the problem was related to System will be halted when tasks number is large - Ray Core - Ray.

However, I am still curious about how to make Ray bind to a specific workstation’s IP. Any help is very much appreciated.