Hi, I am a new Ray user and still novice with Slurm / PBS.
I would like to continue using my Dask workflows, by replacing the scheduler with Ray’s one. What I have done for the moment is creating the cluster with Dask distributed (only the cluster, not connecting the client).
Then I’m launching some basic computations using .compute(scheduler=ray_dask_get).
The computations run fine, however they are not distributed over the cluster (only run in parallel on the head node).
I understand that the cluster should also not be launched with dask ? In that case, will all my dask computations run correctly ? (there are a lot in my code)
I have found into Ray"s documentation about running Ray on a PBS cluster. However, I can’t get the script to work with PBS (see here)
Does anyone have any experience on this topic ?
More specifically I would be happy to simply manage to detect the head node and it’s IP by converting the code below to PBS
# Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)
# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
head_node_ip=${ADDR[1]}
else
head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi