Launch jobs via runtime_env with large dependencies

What is the best practice to launch jobs with really large dependencies, e.g large binaries with many dynamic libs that may be larger than 500MB? How do you compare the 3 options below

  1. submit via runtime_env
  2. Let Ray actors/workers individually retrieve those binaries/.so from shared storage, e.g. S3
  3. Build those binary/.so together with Ray into image that run on each pod.

Thanks

1 Like

I think runtime env doesn’t support data that’s bigger than 100MB now (cc @architkulkarni for. confirmation), so I feel like 1 is not the viable option. I think 2 & 3 should be similarly a good option? The third one might be heavyweight, but you won’t have additional overhead waiting for downloading & loading shared objects.

That’s right, for uploading directly from your local machine to the Ray cluster, the limit is 100 MB, but I agree that 2 and 3 are good options and the way Sang compared them makes sense. Here’s some documentation for 2: Handling Dependencies — Ray v1.10.0