How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
- Low: It annoys or frustrates me for a moment.
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
- High: It blocks me to complete my task.
When running the tuning code, node memory becomes high up to 95%. Then memory manager(?) killed actor most recently scheduled. I don’t want some worker is died. I want to keep training(tuning) while scheduling memory itself.
Is there any option to switch off OOM killer?
The actor is dead because its worker process has died. Worker exit type: NODE_OUT_OF_MEMORY Worker exit detail: Task was killed due to the node running low on memory. Memory on the node (IP: 192.168.1.13, ID: f542e0a5bc5d03c64ebf4d976f98d5d33f936f4f89c18a9fbf411d44) where the task (actor ID: ed7e7019f3a37a654d07bccd01000000, name=ImplicitFunc.__init__, pid=64590, memory used=54.66GB) was running was 120.13GB / 125.84GB (0.954604), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 1ae240847d9f22e8b7bb94a248078020057da18394f8ab1f50eeb6ce) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip 192.168.1.13`. To see the logs of the worker, use `ray logs worker-1ae240847d9f22e8b7bb94a248078020057da18394f8ab1f50eeb6ce*out -ip 192.168.1.13. Top 10 memory users: