How severe does this issue affect your experience of using Ray?
- Normal: Trying to understand the Ray components.
Hi there, One of my task died and I was looking into the raylet.err file and saw the following:
1[2023-03-16 11:19:12,486 I 27 27] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. 2[2023-03-16 11:19:12,487 I 27 27] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 50.4927GB of memory. 3[2023-03-16 11:19:12,487 I 27 27] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled 4[2023-03-16 11:19:12,487 W 27 27] (raylet) store_runner.cc:65: System memory request exceeds memory available in /dev/shm. The request is for 50492709273 bytes, and the amount available is 48318382080 bytes. You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'. 5[2023-03-16 11:19:12,487 I 27 92] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(48318382088, /dev/shm/plasmaXXXXXX) 6[2023-03-16 11:19:12,487 I 27 92] (raylet) store.cc:554: ========== Plasma store: ================= 7 Current usage: 0 / 48.3184 GB
df -kh inside the container and see the following:
(base) ray@ed-car-raycluster-kuberay-worker-r-2xlarge-4-spot-w9rds:~/app$ df -kh Filesystem Size Used Avail Use% Mounted on overlay 500G 78G 423G 16% / tmpfs 64M 0 64M 0% /dev tmpfs 30G 0 30G 0% /sys/fs/cgroup /dev/xvda1 500G 78G 423G 16% /tmp/ray tmpfs 50G 32M 50G 1% /dev/shm tmpfs 59G 12K 59G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 59G 4.0K 59G 1% /run/secrets/eks.amazonaws.com/serviceaccount tmpfs 30G 0 30G 0% /proc/acpi tmpfs 30G 0 30G 0% /proc/scsi tmpfs 30G 0 30G 0% /sys/firmware
The underlying instance type is r4.2xlarge with 61GB of RAM. The EKS shows 59.86GB RAM available.
My question is:
0. How is the raylet assigned 50GB RAM ?
- Can the warning lead to any issue causing task to fail? If yes,
- What can one do about it?