What is the best practice to launch jobs with really large dependencies, e.g large binaries with many dynamic libs that may be larger than 500MB? How do you compare the 3 options below
submit via runtime_env
Let Ray actors/workers individually retrieve those binaries/.so from shared storage, e.g. S3
Build those binary/.so together with Ray into image that run on each pod.
I think runtime env doesn’t support data that’s bigger than 100MB now (cc @architkulkarni for. confirmation), so I feel like 1 is not the viable option. I think 2 & 3 should be similarly a good option? The third one might be heavyweight, but you won’t have additional overhead waiting for downloading & loading shared objects.
That’s right, for uploading directly from your local machine to the Ray cluster, the limit is 100 MB, but I agree that 2 and 3 are good options and the way Sang compared them makes sense. Here’s some documentation for 2: Handling Dependencies — Ray v1.10.0
Hello there! Just checking to confirm our understanding that if an s3 uri is provided, then there is no limit to zip file size, specifically example.zip can be any arbitrary size: