[Dask-on-Ray] Calls within a Ray Job

  • Low: It annoys or frustrates me for a moment.

If I call dask.compute(schedular=ray_dask_get) inside a submitted Ray Job will the dask.compute call be able to utilize the entire resources of the Ray Cluster or just the resources assigned to that job?

I have a data pipeline that generates datasets regularly that are then converted to another data format. Each dataset has multiple chunks and the conversion function uses dask.compute() to convert the chunks individually in parallel. It would be easiest for me to wrap each dataset conversion call in a submitted Ray Job, but that wouldn’t make sense if the job doesn’t have access to the entire cluster’s resources. Alternatively I can avoid the Ray Job api altogether.

Hi @elyall right now there is no isolation of the jobs. So it’ll use the full resources IMO.

1 Like