How to pass files(pdfs/images) to ray actors

how do I pass files (e.g. pdfs/jpgs) to ray actors. I understand that since ray is distributed, it may not understand a local file path from one machine to another?

If so, are my alternatives…
a) upload the file to s3 and pass the s3 path to ray actor for it to download? This leads to an external dependency(to s3 or minio) which I’ll do as a last resort if there are no better alternatives.

b) load the value from the path and then pass the value? for e.g. if I have an image path, I use

from PIL import Image
img_path = "/path/to/img.jpg"
im = # so passing `im` instead of the `img_path`

I can do this but I’ve noticed in one scenario where a ML library that accepts either image binary values or paths as input, there were less predictions when I passed the image binary value compared to passing in the path.

I’m wondering if ray has some helper methods which cover common scenarios like these or just curious how others do it.

Hi @rabraham I can figure out several ways to do this:

  1. you can use the way you mentioned here
  2. you can use s3 and use runtime env (Environment Dependencies — Ray 2.2.0), the difference between 1) and 2) is that, ray will download the pkg to local disk before worker started. It’ll only download once per job.
  3. use NFS

I think if the files are too big, 1) or 3) may be better.

1 Like

Hi @yic
Thanks for the quick reply.
Yeah option 2(runtime env) may not work for this scenario as the data is dynamic and comes from external sources but I didn’t think of the NFS idea.
I guess I’ll try the S3 route then. thanks!

1 Like