I want to run multiple jobs on one single node. So how to limit of the maximum of resources for one task to avoiding one task monopolizing all cluster resources.
Hello, I also have the same problem when using Ray jobs about how to limit the maximum resources that a job can use.
The two links you provided are resource constraints at the job entrance and resource scheduling for tasks or actors of Ray core.
But I want to use ray air lib in the script submitted by the job, such as ray train. I found that there is no global way to limit the resources of ray air in the scripts submitted by the job.
I even tried a approach: using ray trains in a task that specifies placement groups, but the result is that ray trains cannot be restricted within placement groups.
So do you have any good suggestions for limiting the resources used by a job in a cluster?
I know there are some resource settings in the configurations of Ray train and Ray tune, but I hope to limit them from a job level.
I would appreciate it if you could reply to me.
I found that “using namespace” may help the job to achieve resource isolation, but it cannot limit the resources used by the job. So is namespace currently the only solution to ensure resource isolation among multiple jobs in a Ray cluster?