singularity exec --bind /tmp/:/tmp lumi_rasmus_ray.sif python raytest1.py
Traceback (most recent call last):
File “/opt/conda/envs/conda_container_env/lib/python3.9/site-packages/ray/_private/node.py”, line 292, in init
File “/opt/conda/envs/conda_container_env/lib/python3.9/site-packages/ray/_private/services.py”, line 460, in wait_for_node
TimeoutError: Timed out after 30 seconds while waiting for node to startup. Did not find socket name /tmp/ray/session_2023-05-04_23-52-37_148997_237020/sockets/plasma_store in the list of object store socket names.
How do I need to change the singularity call to enable ray to run inside it?
actually, I have good news then sort’a. It turned out that my first problems were caused by me testing on the frontend, and not a node. when I run simple scripts with ray on the nodes, including inside a singularity container, things seem to work. I have not had time to run the full lightning example yet, unfortunately.
Would you be willing to share your solution once you are comfortable with it? I have a similar need for my current project using Slurm and Apptainer to run a Ray container and train a model. Any help would be greatly appreciated.