I try to execute two process python on 8 worker ray and it returns this message:
ERROR MAIN - Unhandled exception: Task was killed due to the node running low on memory.
How I’m able to resolve this problem?
I try to execute two process python on 8 worker ray and it returns this message:
ERROR MAIN - Unhandled exception: Task was killed due to the node running low on memory.
How I’m able to resolve this problem?
@tonidep Your task most likely was consuming memory, and the OOM monitor likely killed it.
I’m not clear what do you mean by “execute two process python on 8 worker ray.”
Note that tasks are scheduled on a particular node’s worker, which is a Python process. For each core on each Ray node, you will have Python worker processes on which your task is run.
You can look at the Ray Dashboard or you can use the Ray state cli to discover memory.
How I’m able to resolve this problem?
First is to ascertain what task is eating up the memory and why?
Second is if your task needs more memory than the Ray node has than you need node with larger capacity.
cc: @ClarenceNg
Thanks @Jules_Damji
In parallel on cluster I execute two module python. I execute these module from two different pc.
From dashboard ray and into my log the cluster sospend the execution and return the message low on memory. My workers are image of 8G of RAM and 2 core. I have configurate the cluster with 8 workers.
Must I take the image with more memory or can I configure somethings for resolve this problem?
@tonidep I still don’t follow what do you mean by two module python. I execute these module from two different pc."
8GM is a bit low. You probably want each node in the Ray cluster with at least 16-32GB, and probably 8 cores. I run Ray on my m1 with 10 cores and 64GB, and hardly run into OOM unless my jobs have a major memory leak.