I noticed that if I reserve 70% node memory as object store, the ray status
doesn’t give me 30% memory (I thought it’s heap, but seems it refers to total ray node memory?) and 70% object store, instead, it shows me the same amount of memory and object store.
And on dashboard, I’m wondering if the memory usage is the total heap and object store usage on a node.
The object store column does show the expected size, i.e., 70% of node memory.
The memory column however, shows the total memory.
How can I tell if heap is out of memory, or object store is oom based on the dashboard report.
@valiantljk Could you provide more details here? E.g., how you set up the nodes, what you seen when you run ray status
and maybe screenshots of ray dashboard ui you saw?
cc: @sangcho @rickyyx @aguo
Gentle reminder for more details. @valiantljk
@valiantljk I tested it a little bit using ray nightly but I couldn’t reproduce the 1st issues you had. I’m using a single-node cluster that uses m5 2xlarge as the head node.
I used 2 different configs:
- Default config provided by Ray.
- total memory: 32 GiB
- total memory available to Ray: 28.8 GiB
- 30% of 28.8 GiB (8.52GiB) allocated to object store and the rest (17.05GiB) made available as logical memory resources:
- Change the object store to 15 GiB
- total memory: 32 GiB
- total memory available to Ray: 28.8 GiB
- 15 GiB is set for object store. Rest of it (10.57 GiB) is made available as logical memory resources:
Both ray status
and ray dashboard’s node table reflect the two cases correctly.
For the ray dashboard issue
How can I tell if heap is out of memory, or object store is oom based on the dashboard report.
I created a gh issue here: [Dashboard] what the memory column refers to in the node table is not clear · Issue #32073 · ray-project/ray · GitHub