if i’d like to use a ray cluster for a purely embarrassingly parallel workload, i.e. which doesn’t need the global object storage, etc. , is there a way to not include / drastically minimize such components in a particular ray deployment?
You can use object_store_memory=0 in this case?
@mbehrendt How large is the data in this embarrassingly parallel workload? There’s a good chance that you’ll still want to use Ray’s dataplane (distributed in-memory object store) for an embarrassingly parallel workload unless the data is very small, in which case Ray will bypass the object store automatically. If you observe very low object store usage, you can always change the object store allocation as @sangcho suggested to allocate less RAM to the object store and more RAM to the worker heap.
@Clark_Zinzow I don’t have a single specific case – my question is rather from the perspective of a certain class of workloads. The data might be as little as one URL per task invocation – so for cases like these, it felt like there might be benefit in taking the object storage out of the loop. I also asked since I wasn’t sure whether adding and removing capacity might carry some additional perf penalty with the object storage enabled, from the perspective of adding and removing nodes might cause some syncing/registering/deregistering/… to happen.
The data might be as little as one URL per task invocation – so for cases like these, it felt like there might be benefit in taking the object storage out of the loop.
In this case, the URLs won’t be stored in the object store, they’ll be automatically inlined into the task specification that’s sent to a Ray worker for execution, so the object store is completely bypassed. We do this for all task arguments that are under some configurable threshold (the default is 100KiB).
I also asked since I wasn’t sure whether adding and removing capacity might carry some additional perf penalty with the object storage enabled, from the perspective of adding and removing nodes might cause some syncing/registering/deregistering/… to happen.
There will be a small amount of time spent initializing the object store at node startup, but it should be very negligible in relation to the overall node startup time. In the critical path of executing tasks, if the task arguments and return values are under 100KiB, the object store will be completely bypassed and should therefore not add any overhead.