Ray.init() hanging with conda (pip) installation

  • High: It blocks me to complete my task.

Hi all,

I’m trying to setup Ray in my workflow. I’ve started by doing a clean install using conda. However I’m facing this exact same problem: Ray workers unable to register when used with "venv"-created virtual environment on Windows with Python 3.7.3+ · Issue #13794 · ray-project/ray · GitHub.

What I’ve done:

conda create -n ray python=3.7
conda activate ray
pip install ray[tune]

Then, within python:

import ray
ray.init()

This hangs and gives no output. If I look at the logs (e.g. raylet.err), I get the exact same error mentioned in the above GitHub issue:

(raylet) worker_pool.cc:481: Some workers of the worker process(2841883) have not registered within the timeout. The process is still alive, probably it's hanging during start.

However, if I run:

import ray
ray.init(num_cpus=1)

Then ray initializes correctly.

I’m using a shared HPC infrastracture running on CentOS Stream 8.

Versions:

  • Ray 1.11.0
  • Python 3.7.8
  • CentOS Stream 8

I think that Ray will start a worker process per detected CPU core; if you’re using shared HPC infra, could Ray possibly be trying to start a ton of worker processes?