Set memory to user assigned shared by cluster computing system


In an existing cluster, I get a shared host, in such a shared host a different user process could also be running on the same host where my ray head process is launched.

I started cluster with the command :

ray start --address $head_node:6379 --num-cpus 1 --num-gpus 0 --object-store-memory 4000000000

ray status provides me the below output:

1 node(s) with resources: {‘node:’: 1.0, ‘CPU’: 1.0, ‘accelerator_type:V100’: 1.0, ‘object_store_memory’: 4000000000.0, ‘memory’: 510375756800.0)}

ray sees 510GB of memory (‘memory’: 510375756800.0), in a cluster computing env like SLURM or LSF this may not be true as I am sharing the host.

can you provide details on how do I set the memory to RAM allocation assigned to me by SLURM or LSF? or is ray status reporting incorrect metrics?

cc @sangcho can you help with this?

cc @Dmitri @Ameer_Haj_Ali do you know how to achieve this?

cc @Alex I remember he knows

@asm582 Ray will not tightly manage the memory usage of your compute tasks, and so ray status is both reporting incorrect metrics but in a harmless way.

@rliaw Thanks for confirming, I think idle node termination depends on the ray status command, hope ray status reports other resources correctly.
cc @sangcho

In the latest master, the object store memory should be reported correctly whereas memory is probably not (this might be supported in the future, but we have other higher priorities)