Ray cluster on PBS

Michael · March 3, 2021, 1:22pm

Hi, I am a new Ray user and still novice with Slurm / PBS.
I would like to continue using my Dask workflows, by replacing the scheduler with Ray’s one. What I have done for the moment is creating the cluster with Dask distributed (only the cluster, not connecting the client).
Then I’m launching some basic computations using .compute(scheduler=ray_dask_get).
The computations run fine, however they are not distributed over the cluster (only run in parallel on the head node).
I understand that the cluster should also not be launched with dask ? In that case, will all my dask computations run correctly ? (there are a lot in my code)

I have found into Ray"s documentation about running Ray on a PBS cluster. However, I can’t get the script to work with PBS (see here)

Does anyone have any experience on this topic ?

More specifically I would be happy to simply manage to detect the head node and it’s IP by converting the code below to PBS

# Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)

head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)

# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
  head_node_ip=${ADDR[1]}
else
 head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi

Topic		Replies	Views
Dask on Ray + Ray Distributed Cluster - Workers not getting used? Ray Core	9	711	February 14, 2021
[Dask on Ray] Low cluster utilization Ray Core	0	364	December 28, 2022
[Dask-on-Ray] Calls within a Ray Job Ray Core	1	223	October 12, 2023
Running Ray on Slurm Cluster	11	1197	January 31, 2021
Running RLlib on PBS cluster RLlib	1	207	October 20, 2021

Ray cluster on PBS

Related topics