Set Job ID (or Submission ID?) to group tasks under separate Jobs

Soichi_Hayashi · January 12, 2023, 9:25pm

I have a web server that runs ray.init() during the initialization. The web server will sit and wait for users to submit a request, and our server will launch a ray workflow for each request.

When I look at the ray dashboard, I see only a single job ID created when the web server first runs ray.init(). All of the jobs from different requests (from different users) are placed under the same Job ID.

Instead, I’d like to organize each request(workflow) under separate Job IDs. I see that there is an API for RuntimeContext.get_job_id but not set_job_id. Is there a way to instruct ray to place tasks under a different JobID for each request that our web server receives?

The only way that I can think of is to create a new process to handle each request and call ray.init() from the each sub-process. I’d like to avoid this approach as spawning a new process and initializing our application slows down our processing.

sangcho · January 13, 2023, 1:48am

Unfortunately, it is not possible from the core level as the job == the script that runs ray.init() (from the dashboard).

cc @yic do we have any request similar to this and have a solution for it?

zoe_tsekas · January 13, 2023, 4:22pm

Would Ray namespaces help?
Using Namespaces — Ray 2.2.0

yic · January 17, 2023, 9:21pm

I don’t think so. I feel namespace potentially should be the one to work with.
Basically, you can have actors/jobs in different ns. And later when you send traffic, pick the right ns to send the traffic.

Soichi_Hayashi · January 17, 2023, 10:22pm

@zoe_tsekas @yic Thank you for suggesting ray namespace. I tried to use it, but it turned out that I can only use it for Ray actor - not Ray remote.

When I tried to set namespace for Ray remote, I get the following error message.

ValueError: Invalid option keyword namespace for remote functions. Valid ones are [‘accelerator_type’, ‘memory’, ‘name’, ‘num_cpus’, ‘num_gpus’, ‘object_store_memory’, ‘placement_group’, ‘placement_group_bundle_index’, ‘placement_group_capture_child_tasks’, ‘resources’, ‘runtime_env’, ‘scheduling_strategy’, ‘_metadata’, ‘max_calls’, ‘max_retries’, ‘num_returns’, ‘retry_exceptions’].

I could also set namespace in ray.init(), but I can only call ray.init() once when our server starts up, so it doesn’t help either.

As far as I can tell, Actor namespace is just to help with making sure that actor names won’t collide between different workspaces. I wonder why Ray assumes that there is only 1 job submitted per ray.init()?

yic · February 21, 2023, 6:20pm

@Soichi_Hayashi Job is a ray layer concept, and whenever you call ray.init, it’s a job.

The job you mean in the post, it’s an app layer job and it should be managed by yourself, for example, you can have multiple nodes and each running a job and you need to do LB and routing on your side. What you tried to distribute is tasks.

Namespace is for isolation. Actors can have names, and they might conflict in two jobs and this is where namespace comes for.

Topic		Replies	Views
A way to share a job id between two python processes that run ray.init()? Ray Core	1	32	February 20, 2025
What's the difference between submission_id and job_id for ray job Ray Core	1	790	November 24, 2022
Ray 1.7.0 ray.init(runtime_env=) kills cluster (was: cluster stuck on "The actor or task with ID [] cannot be scheduled right now") Ray Core	5	1264	October 18, 2021
Queue difference between JobSubmission vs normal ray job from ray init Ray Core	2	506	December 8, 2022
Submit remote work to a specific worker Ray Core	8	585	September 26, 2023

Set Job ID (or Submission ID?) to group tasks under separate Jobs

Related topics