Hello,
In an existing cluster, I get a shared host, in such a shared host a different user process could also be running on the same host where my ray head process is launched.
I started cluster with the command :
ray start --address $head_node:6379 --num-cpus 1 --num-gpus 0 --object-store-memory 4000000000
ray status provides me the below output:
1 node(s) with resources: {‘node:10.187.57.53’: 1.0, ‘CPU’: 1.0, ‘accelerator_type:V100’: 1.0, ‘object_store_memory’: 4000000000.0, ‘memory’: 510375756800.0)}
ray sees 510GB of memory (‘memory’: 510375756800.0), in a cluster computing env like SLURM or LSF this may not be true as I am sharing the host.
can you provide details on how do I set the memory to RAM allocation assigned to me by SLURM or LSF? or is ray status reporting incorrect metrics?