Proper pattern to use from Django

Severity: pretty hard

Hi all,

As I’m discovering about Ray, I would love to be able to integrate it with my Django application as it appears to be a much more powerful and flexible option than Celery, especially for my needs.

However, I am noticing that the way the Python “client” works seems to be conceputally incompatible. Indeed, it appears that you should call ray.init(), which will connect the current driver to the cluster and start a job. This job will then continue until being stopped — potentially months later, as it is a long-running web server.

On the contrary, if the job gets finished early then all subsequent tasks appear to be terminated, but that’s not what I want: the goal is to start tasks on my Ray cluster in a fire-and-forget mode, the tasks usually being the ones in charge of notifying of their own completion — either by changing a row in the database and/or by broadcasting a message to websocket consumers.

You’ll understand that overall, for use from a web server, the behavior of Celery is what I wish.

Now after digging into the code, I discover there a global_worker and _global_node, which are hardwired into the decorators in a way that makes it impossible to scope those to the current context (whichever this might be).

The closest thing that I’m finding is to use JobSubmissionClient, however this in turns looks excessively tedious.

As a result, I’m kind of stuck with no good pattern nor way to manage this issue. As I’m looking into it I’ll be sure to report my findings, however if anyone else encounters interesting results, I’m all ears.

Thanks
Rémy

Actually JobSubmissionClient goes through the dashboard API?!

I’m gonna keep digging :smiling_face_with_tear:

You want to submit Jobs; that’ll give you the fire and forget mode you’re looking for. See here to get started: Python SDK Overview — Ray 2.31.0

TPM @ Anyscale here; welcome to the Ray Community!

Thanks for this!

It is what I mentioned, I understand that the “canon” way to do this is through a JobSubmissionClient, however this triggers me on two points:

  • It’s extremely inconvenient to use. Why on one hand you can do func.remote() and on the other you need to wrap it all yourself? — In my use-case, that’s the main way to invoke code, so it’s very disappointing if it boils down to this
  • It seems to be going through the dashboard API (based on the port number) which instinctivly raises questions regarding authentication — and the fact that the dashboard is after all an optional component

On the other hand I’ve tried to dig into what serve is doing and indeed it keeps a long-lived job. So at least this point is not problematic I guess.

My next target is to see what could be done with actors, given that they can be detached.

Would you be open to chatting live; a couple of us here @ Anyscale looking at this and we’re not quite understanding what you are trying to do.

I’m “Sam (Ray Team)” on Ray Slack.

Sure! Thanks :slight_smile:

Let me know how you want to proceed?

Slack me on Ray Slack!