Launch jobs via runtime_env with large dependencies

What is the best practice to launch jobs with really large dependencies, e.g large binaries with many dynamic libs that may be larger than 500MB? How do you compare the 3 options below

  1. submit via runtime_env
  2. Let Ray actors/workers individually retrieve those binaries/.so from shared storage, e.g. S3
  3. Build those binary/.so together with Ray into image that run on each pod.

Thanks

1 Like

I think runtime env doesn’t support data that’s bigger than 100MB now (cc @architkulkarni for. confirmation), so I feel like 1 is not the viable option. I think 2 & 3 should be similarly a good option? The third one might be heavyweight, but you won’t have additional overhead waiting for downloading & loading shared objects.

That’s right, for uploading directly from your local machine to the Ray cluster, the limit is 100 MB, but I agree that 2 and 3 are good options and the way Sang compared them makes sense. Here’s some documentation for 2: Handling Dependencies — Ray v1.10.0

Hello there! Just checking to confirm our understanding that if an s3 uri is provided, then there is no limit to zip file size, specifically example.zip can be any arbitrary size:

runtime_env = {…, “working_dir”: “s3://example_bucket/example.zip”, …}

Thank you.

@ec777 yes, Ray does not enforce a size limit on remote URIs. This is because the file contents are not stored in GCS.

Thanks for the confirmation @cade