After reading the Ray design paper, it seems the security boundary is at the EC2 machine. How could I share the same Ray cluster with many users? One solution I could think of is to launch the worker in a container and share the raylet with different users.
I think the current solution for supporting multiuser is to create a cluster for each user requests. But the overall utilization might be low.
Could you share a little bit about what your use case?
Do these users tend to have the same dependencies?
Do you care about scheduling fairness between users?
Do you need strong resource isolation between users?
In general, you can run multiple jobs in the same ray cluster, but they aren’t isolated.
It’s more like a thought study for these situations after watching the anyscale serverless video. For a general shared public serverless solution, you would care about scheduling fairness and resource isolation for individual user. For dependencies, I’m not sure how Ray is currently handled. I think it would also need to solve this unless the user also run a similar program with similar dependency (they all using tensorflow/pytorch). Otherwise, container is a good way to solve this dependency problem. I think currently Ray cluster is more for internal usage(they share the same security context and similar dependency).
Ah I see, yeah to be clear, Ray as an open source project does not provide strong guarantees of resource, performance, or execution isolation.