How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi, I rely on Ray Tune to tune hyperparameters on GCP. I am now on the latest version (2.37.0). After launching my training pipeline, only the first few worker nodes will sync the files I listed under the file_mounts
key in the configuration file, the others will not, therefore the process cannot complete. As expected, all worker nodes are on the very same environment, which boggles me…
I also noticed on the logs, under the “[2/7] Processing file mounts” stage, that those first workers do indeed sync files, but for the next, nothing syncs and I get something like “No worker file mounts to sync”.
I tried to send my files via the worker_setup_commands
, copying from GCP storage through the gsutil
tool, but that didn’t work, it seems the commands under this key are being ignored. As I fact, I just noticed that the setup_commands
are also being skipped, as the log says “No setup_commands to run”.
What might be going wrong? How can I investigate further?