Object store memory allocation on cluster

I tried to run a program that requires large-size ray object storage, on a cluster machine that has about 260GB memory left. It only allocated 90GB for object storage. When I tried to use “object_store_memory” parameter in ray.init(), it informed me that it is forbidden to allocate object store memory by yourself on cluster. Why? How can I fully utilize 260GB memory to store objects as much as possible?

WARNING:tensorflow:From /root/anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-11-26 08:31:17,726 INFO worker.py:651 – Connecting to existing Ray cluster at address:
Traceback (most recent call last):
File “./buffer_test.py”, line 105, in
ray.init(address=‘auto’, object_store_memory=150000000000)
File “/root/anaconda3/lib/python3.7/site-packages/ray/worker.py”, line 728, in init
raise ValueError("When connecting to an existing cluster, "
ValueError: When connecting to an existing cluster, object_store_memory must not be provided.

Try setting this in ray start --object-store-memory.

1 Like

Was having the same issue. I tried using ray.init’s object_store_memory param and using ray’s command line interface (ray start --head --port=6379 --object-store-memory 2000000000) but the 1st one did not allocate enough mem, and the 2nd one errors out with “RuntimeError: Couldn’t start Redis. Check log files”.

Note that ray’s object store uses shared memory, so you cannot allocate the whole machine memory to it (that will cause issues). By default, we are using 30% of your machine memory for object store.

About Redis issue, did you post this issue anywhere?