Running multiple projects with different python versions/ray versions and docker images

Context:
We need to deploy numerous models to production. Right now we use docker containers and Amazon ECR. Each docker container talks to our internal pipeline orchestrator and picks the next task to work on, does some work , uploads data to S3, puts the s3 keys in a new task object and passes it along to the orchestrator for it to give it to the next worker. Each task is a json file with values so serialization/deserialization is not an issue

Wish:
I was wondering if I could move from our internal pipeline orchestrator to a Ray. But I’m struggling to figure out if I can get true process isolation. Maybe I’m missing some info and someone could enlighten me.

  • The ideal scenario is for me to make a project repo for a model, create a python model, Use the ray decorator and deploy that as a named actor in my cluster to be accessed by anyone. I can currently do this.
  • I can minimize the need for docker containers by using Runtime Environments to use pip/conda libraries and I can package binaries in the project repo so that they can be used without them having to be ‘installed’ on any machine.
  • But I’m concerned that there may be some use case I haven’t figured out and ultimately, I may need a docker container approach.
  • There is a new container feature that allows one to deploy containers to a Ray cluster. But the requirement here seems to be that every container should have the same python version and ray version as the cluster. (Ref: How to use container in Runtime Environments? - #3 by GuyangSong)

For practical purposes of bringing in existing projects which have different Python versions, it’s going to be an effort to ensure that it is the same python version across all projects that we’ll have.

I was reading about the excellent Merlin system(The Magic of Merlin: Shopify's New Machine Learning Platform — Data Science & Engineering (2022)) and it seems that every Merlin Workspace (which uses a base image) is it’s own Ray cluster? So If I want multiple docker images with different python versions to co-exist in a Ray system, it’ll have to be on different ray clusters?

  • If so, can Actors on different clusters talk to each other or do I need to create a Ray Serve instance on different clusters as wrappers for actors on different clusters so that they can be called and that way different clusters talk to each other?

Maybe I have a naive design pattern? Maybe process/container isolation can be done in a better way in Ray than what I’m thinking of?

Best Regards,
Rajiv

Hey @rabraham, to clarify, currently all tasks/actors running in the same cluster need to have the same Python and Ray versions even if they’re using different runtime_envs. We have had some other folks ask about this and I will make sure to update the docs so they’re more clear.

For your use case, if it’s a requirement that different models can use different Python and Ray versions, you would need to make them each run on different Ray clusters as you suggested. You could use Ray Serve to expose them over HTTP to talk to each other.

Out of curiosity, what is the reason that you might have many different Python versions for different models? Is this just because they are legacy projects that were started independently? I haven’t heard this requirement in the past so just want to make sure I understand the context.

1 Like

Thank you very much for answering my vague question.

  • yes, my current use case is to primarily bring together legacy projects that were started independently with incompatible versions. For e.g some projects have a library with a dependency on an older Redis version which is incompatible with later versions of Ray. so I have to refactor that code and maintain two different requirements file if I want to bring that project onto a Ray cluster
  • I’m also wondering about ray/python upgrades. Let’s say we have multiple projects maintained by more than one person all on the same ray/python version. one of the projects wants to upgrade their Ray version to try out a cool new feature. The others don’t have need for that feature and it’s not on their priority list. But we all use the same cluster enjoying the benefits of calling actors directly. so I guess we all have to upgrade?
  • Also, let’s say we all decide to upgrade to a new Ray version, what would be the process? We create a new cluster, and slowly deploy downstream projects first onto the new cluster going up the Directed Acyclic Graph. If it’s a DAG of actors, I guess this is easy to do but if there are a cycle of actor dependencies, this may be difficult?
  • re: multiple python versions. Though I haven’t seen this happen in practice … yet but let’s say we want to try this cool new ML library but that is using some cool feature in Python 3.9 and we all are on Python 3.8. Then we face a similar problem of wanting to upgrade but having to wait or ask everyone to upgrade if we want to use that cool new library?

Perhaps if you could share your experiences on how your big clients which have multiple teams deploying to Ray clusters and how they upgrade multiple projects, that’d be great.