Since Ray data doesn’t support Ray client([Data] Ray client error "Global node is not initialized." · Issue #41333 · ray-project/ray · GitHub). I have to run dataset.map_batches as ray task.
but I got little confuse about the concurrence,
I saw params concurrency
for dataset.map_batches
, it means ray worker to use concurrently, that’s great.
but when I run it in ray task, with @remote(cpu_num=1), and concurrency = 4
, does that means a confliction? how to understand and guarantee the resource between task and dataset.map_batch
running in that task.
thanks