Hi,
I have gotten significant speedups using ray for my task but I have been facing some difficulties with some new approaches. To summarise I have task A which has many subtasks B. Initially I parallelized the tasks in B while keeping A sequential. I found a way to perform B as part of a matrix and so was trying to parallelise A instead. To complete a single task in A, it takes less than 10 seconds, even milliseconds but after parallelizing, it took 30 seconds on average. The issue is that A requires reading from a large database (>10GB) which is why I think there is a slowdown. I already store the data using ray.put()
and pass on the references to my function where I recreate the arrays using ray.get()
. I was trying to instead store the references in a json file and load it in each function but the ObjectRef
is not serializable and ray.get()
requires the ObjectRef
object and cannot work with the string of reference. Is there any way to do this or any other ideas that I can use. Please let me know if I have been too vague in describing the problem.