Hi everyone,
I have a task that requires 6GB of RAM and 3 CPUs to execute. In my cluster, I have two worker nodes, each with 4GB of RAM and 3 CPUs.
Is it possible to run that task assuming that I have an 8gb of ram combined (4gb+4gb)
Any help or guidance would be greatly appreciated!
THANK YOU!!!
The basic unit of scheduling for Ray jobs is a task. If your task requires 6GB of RAM, then it cannot be scheduled on a node with only 4GB of RAM available. If you break your task down into smaller tasks, each could be scheduled and even executed concurrently. It’s a bit like Kubernetes scheduling: you cannot schedule a pod requesting 6GB of RAM on a node that has only 4GB, and you cannot schedule half a pod.
2 Likes
Thanks dude,is there a way or any other framework so that we do this kind of stuff
Well, Ray is one of such frameworks but it doesn’t do the job for you, only takes away some of the pain of running the infrastructure for that. You still need to decompose your monolithic task into smaller, independently running units.
If you are running ML workloads with PyTorch, you may want to check out PyTorch Lightning. Other than that, making general recommendations is tricky. It is just impossible to solve for in a general case, and it depends on what the task is doing. I suspect your task is already to some extent parallelized because you said it’s running on 3 CPUs, so I’d start with figuring out how much independence there is between threads and if they can be running on different nodes.