[Dask-on-Ray] Calls within a Ray Job

elyall · October 5, 2023, 5:09pm

Low: It annoys or frustrates me for a moment.

If I call dask.compute(schedular=ray_dask_get) inside a submitted Ray Job will the dask.compute call be able to utilize the entire resources of the Ray Cluster or just the resources assigned to that job?

I have a data pipeline that generates datasets regularly that are then converted to another data format. Each dataset has multiple chunks and the conversion function uses dask.compute() to convert the chunks individually in parallel. It would be easiest for me to wrap each dataset conversion call in a submitted Ray Job, but that wouldn’t make sense if the job doesn’t have access to the entire cluster’s resources. Alternatively I can avoid the Ray Job api altogether.

yic · October 12, 2023, 4:57pm

Hi @elyall right now there is no isolation of the jobs. So it’ll use the full resources IMO.

Topic		Replies	Views
Dask on Ray – custom resources Ray Core	6	544	March 25, 2022
[Dask on Ray] Low cluster utilization Ray Core	0	366	December 28, 2022
Dask on Ray + Ray Distributed Cluster - Workers not getting used? Ray Core	9	719	February 14, 2021
Ray Client + Dask on Ray? Ray Client	5	934	April 21, 2021
Recommended way to parallelize ray.get() calls to the driver (to pipeline Dataloader) Ray Core	2	325	April 26, 2021

[Dask-on-Ray] Calls within a Ray Job

Related topics