Although node memory usage is high, I don't want to kill my actor

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

When running the tuning code, node memory becomes high up to 95%. Then memory manager(?) killed actor most recently scheduled. I don’t want some worker is died. I want to keep training(tuning) while scheduling memory itself.
Is there any option to switch off OOM killer?

The actor is dead because its worker process has died. Worker exit type: NODE_OUT_OF_MEMORY Worker exit detail: Task was killed due to the node running low on memory.
Memory on the node (IP: 192.168.1.13, ID: f542e0a5bc5d03c64ebf4d976f98d5d33f936f4f89c18a9fbf411d44) where the task (actor ID: ed7e7019f3a37a654d07bccd01000000, name=ImplicitFunc.__init__, pid=64590, memory used=54.66GB) was running was 120.13GB / 125.84GB (0.954604), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 1ae240847d9f22e8b7bb94a248078020057da18394f8ab1f50eeb6ce) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip 192.168.1.13`. To see the logs of the worker, use `ray logs worker-1ae240847d9f22e8b7bb94a248078020057da18394f8ab1f50eeb6ce*out -ip 192.168.1.13. Top 10 memory users:

I remember that there is a way to turn it off. @ClarenceNg works on it and knows the details.

1 Like

It can be disabled by setting the environment variable RAY_memory_monitor_refresh_ms to zero when starting ray

More details of the module :

https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html

1 Like

Thanks! I’m trying now.