[Core] Controlling object store size?

We get the following crash at startup due to object store size exceeding /dev/shm size.
However, we don’t need anything even near 200gb of object store, and certainly don’t want 200gb /dev/shm, as we need the memory for the individual actors. How can we specify a max object store size? (trying to pass object_store_memory via ray.init() fails, saying it cannot be specified when run on cluster)

Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1706, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 657, in start
    ray_params, head=False, shutdown_at_exit=block, spawn_reaper=block)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/node.py", line 234, in __init__
    self.start_ray_processes()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/node.py", line 897, in start_ray_processes
    huge_pages=self._ray_params.huge_pages
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 1713, in determine_plasma_store_config
    object_store_memory / 1e9, shm_avail / 1e9))
ValueError: The configured object store size (200.0 GB) exceeds /dev/shm size (200.0 GB). This will harm performance. Consider deleting files in /dev/shm or increasing its size with --shm-size in Docker. To ignore this warning, set RAY_OBJECT_STORE_ALLOW_SLOW_STORAGE=1.

UPADTE:

I managed to reduce the object store size by editing the config yaml file, in particular worker_start_ray_command and head_start_ray_command, by adding --object-store-memory param to ray start:

head_start_ray_commands:
    - ray stop
    - >-
      ulimit -n 65536;
      ray start
      --head
      --port=6379
      --object-manager-port=8076
      --autoscaling-config=~/ray_bootstrap_config.yaml
      --object-store-memory=1000000000
worker_start_ray_commands:
    - ray stop
    - >-
      ulimit -n 65536;
      GOOGLE_APPLICATION_CREDENTIALS=/home/ray/ray.json ray start
      --address=$RAY_HEAD_IP:6379
      --object-manager-port=8076
      --object-store-memory=1000000000

3 Likes

Just ran into this as well, tracking here: [Bug][Core][Autoscaler] Misconfigured object store size on GCP with Docker · Issue #20614 · ray-project/ray · GitHub