Object store memory allocation on cluster

nuzant · November 26, 2020, 8:29am

I tried to run a program that requires large-size ray object storage, on a cluster machine that has about 260GB memory left. It only allocated 90GB for object storage. When I tried to use “object_store_memory” parameter in ray.init(), it informed me that it is forbidden to allocate object store memory by yourself on cluster. Why? How can I fully utilize 260GB memory to store objects as much as possible?

WARNING:tensorflow:From /root/anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-11-26 08:31:17,726 INFO worker.py:651 – Connecting to existing Ray cluster at address: 192.168.137.151:6379
Traceback (most recent call last):
File “./buffer_test.py”, line 105, in
ray.init(address=‘auto’, object_store_memory=150000000000)
File “/root/anaconda3/lib/python3.7/site-packages/ray/worker.py”, line 728, in init
raise ValueError("When connecting to an existing cluster, "
ValueError: When connecting to an existing cluster, object_store_memory must not be provided.

rliaw · November 26, 2020, 8:34am

Try setting this in ray start --object-store-memory.

RK900 · February 5, 2021, 3:14am

Was having the same issue. I tried using ray.init’s object_store_memory param and using ray’s command line interface (ray start --head --port=6379 --object-store-memory 2000000000) but the 1st one did not allocate enough mem, and the 2nd one errors out with “RuntimeError: Couldn’t start Redis. Check log files”.

sangcho · February 5, 2021, 6:14am

Note that ray’s object store uses shared memory, so you cannot allocate the whole machine memory to it (that will cause issues). By default, we are using 30% of your machine memory for object store.

About Redis issue, did you post this issue anywhere?

Topic		Replies	Views
`ray.cluster_resources()` output Ray Core	2	291	April 29, 2021
Ray Serve Object Store Memory Issue: ray.exceptions.ObjectStoreFullError Ray Serve	1	506	April 24, 2021
[Core] Controlling object store size? Ray Core	2	1763	November 20, 2021
ValueError: Attempting to cap object store memory usage at 67108864 bytes, but the minimum allowed is 78643200 bytes Ray Core	4	2968	March 20, 2021
How to tell cluster launcher to not reserve so much memory for a ray worker node? Ray Clusters	1	423	November 17, 2022

Object store memory allocation on cluster

Related topics