Yes, your approach is correct: building a wheel of your project and installing it in your Ray container is the recommended way to ensure all nodes have access to your code and dependencies when using Bazel for packaging. This is a common solution for Ray clusters where dependencies are not available in the Python environment by default. You will need to manually specify and install extra dependencies in your container, as Ray expects all dependencies to be pre-installed on all nodes; there is no built-in Ray mechanism to automatically collect Bazel-managed dependencies for you. This method is also suggested for similar setups in Spark and is considered best practice for Ray clusters running in Docker containers on Kubernetes or GKE environments, as it ensures consistency across all nodes and avoids serialization issues with dependencies at runtime (discuss.ray.io: Ray runtime env from docker image or bazel target, discuss.ray.io: Python code with large dependencies).
If you want to automate collecting extra dependencies, you may need to script the extraction of Bazel-managed dependencies into your Docker build process, but this is outside Ray’s scope. Using a custom Docker image with all dependencies pre-installed is the most robust and maintainable approach (discuss.ray.io: Python code with large dependencies, discuss.ray.io: Ray runtime env from docker image or bazel target). Would you like more detail on automating dependency collection or Docker image setup?
Sources:
- discuss.ray.io: Ray runtime env from docker image or bazel target
- discuss.ray.io: Python code with large dependencies
Hint: Mention @RunLLM in the post for followups.