Proper pattern to use from Django

Xowap · July 2, 2024, 9:33pm

Severity: pretty hard

Hi all,

As I’m discovering about Ray, I would love to be able to integrate it with my Django application as it appears to be a much more powerful and flexible option than Celery, especially for my needs.

However, I am noticing that the way the Python “client” works seems to be conceputally incompatible. Indeed, it appears that you should call ray.init(), which will connect the current driver to the cluster and start a job. This job will then continue until being stopped — potentially months later, as it is a long-running web server.

On the contrary, if the job gets finished early then all subsequent tasks appear to be terminated, but that’s not what I want: the goal is to start tasks on my Ray cluster in a fire-and-forget mode, the tasks usually being the ones in charge of notifying of their own completion — either by changing a row in the database and/or by broadcasting a message to websocket consumers.

You’ll understand that overall, for use from a web server, the behavior of Celery is what I wish.

Now after digging into the code, I discover there a global_worker and _global_node, which are hardwired into the decorators in a way that makes it impossible to scope those to the current context (whichever this might be).

The closest thing that I’m finding is to use JobSubmissionClient, however this in turns looks excessively tedious.

As a result, I’m kind of stuck with no good pattern nor way to manage this issue. As I’m looking into it I’ll be sure to report my findings, however if anyone else encounters interesting results, I’m all ears.

Thanks
Rémy

Xowap · July 2, 2024, 9:42pm

Actually JobSubmissionClient goes through the dashboard API?!

I’m gonna keep digging

Sam_Chan · July 2, 2024, 10:25pm

You want to submit Jobs; that’ll give you the fire and forget mode you’re looking for. See here to get started: Python SDK Overview — Ray 2.31.0

TPM @ Anyscale here; welcome to the Ray Community!

Xowap · July 3, 2024, 8:28am

Thanks for this!

It is what I mentioned, I understand that the “canon” way to do this is through a JobSubmissionClient, however this triggers me on two points:

It’s extremely inconvenient to use. Why on one hand you can do func.remote() and on the other you need to wrap it all yourself? — In my use-case, that’s the main way to invoke code, so it’s very disappointing if it boils down to this
It seems to be going through the dashboard API (based on the port number) which instinctivly raises questions regarding authentication — and the fact that the dashboard is after all an optional component

On the other hand I’ve tried to dig into what serve is doing and indeed it keeps a long-lived job. So at least this point is not problematic I guess.

My next target is to see what could be done with actors, given that they can be detached.

Sam_Chan · July 3, 2024, 9:42pm

Would you be open to chatting live; a couple of us here @ Anyscale looking at this and we’re not quite understanding what you are trying to do.

I’m “Sam (Ray Team)” on Ray Slack.

Xowap · July 4, 2024, 8:05am

Sure! Thanks

Let me know how you want to proceed?

Sam_Chan · July 4, 2024, 8:30pm

Slack me on Ray Slack!

Topic		Replies	Views
Using Ray as replacement for Celery (generic task executor) Ray Core	1	220	January 28, 2025
Proper cleanup of Ray long running actor (celery like setup) Ray Core	0	33	January 23, 2025
Future of ray.init(remote address) Ray Core	10	414	May 14, 2023
Is it a good idea to use celery in combination with Ray? Ray Client	0	42	January 3, 2025
Ray cluster didn't use all the available CPU nodes Ray Clusters	1	541	February 16, 2024

Proper pattern to use from Django

Related topics