Monitor state of submitted Tasks

jafermarq · May 14, 2021, 2:22pm

Hi,

I was wondering if there’s some built-in feature to monitor the state of the submitted tasks to a @ray.remote function. By “state” I mean if the task is either “pending”, “running” or “finished”. (similar to what is already being reported when using ray.tune)

My use case is quite simple: (1) a function submits N ray tasks (these will be scheduled and run in parallel just like ray’s core functionality); (2) however, I don’t have/want a ray.get() blocking call that waits for these tasks to complete; (3) in other bits of my code I need to check at any given point how many of the submitted runs are “running”.

I cannot find a built-in functionality for this so this is how I have implemented it:

I have an auxiliary Ray Actor that’s shared via an ActorHandle with each of the submitted tasks. The Actor class essentially behaves as a counter. In this way, when one of the submitted tasks starts running, it will increment by one the counter in the Actor. Before the tasks finished, i.e. before the @ray.remote function exists, the counter is decremented by one. Below is the simplified code:

@ray.remote
def task_launcher(task_id: int, monitor: ActorHandle):

    # increment counter and set task as "running"
    monitor.task_running.remote(task_id)

    # do stuff

    # decrement counter and set task as "finished"
    monitor.task_finished.remote(task_id)

Having this Actor allows me to monitor how many tasks are running at a given point. This bit of information is passed to other functions in my code. (it can easily be extended to measure how many tasks have been completed and how many are still pending).

Am I re-inventing the wheel here? is this functionality already present?

Mingwei · May 14, 2021, 4:26pm

Can ray.wait([...]) meet the use case?

jafermarq · May 14, 2021, 4:39pm

Good point. I was using ray.wait() before but, since it only returns the lists of “completed” and “uncompleted” objects, it requires extra logic to determine how many of the “uncompleted” are currently running. I found the approach of having an Actor more convenient… but still requires a few lines of code just to have this simple functionality working. This is why I was wondering if something simpler, already in the Ray framework exists.

sangcho · May 14, 2021, 5:32pm

Hey @jafermarq. Thanks for asking this. Your approach is definitely a working approach, and I will do the same thing if I’d like to get more detailed information about each task. You can even improve the observability by tracking of other metrics like running time or memory usage.

Unfortunately, there’s no built-in feature for task stats. We tried implementing it in the past, and the it was somewhat deprioritized. If you are interested in the built-in feature, it’ll be great if you start a feature request, so the team will see the user demand (which means we will more likely to spend resources on it)!

asm582 · May 14, 2021, 6:32pm

plus 1 on this request, I think it will help me with the issue raised here:

github.com/ray-project/ray

Set time-out on individual ray task

opened 08:03PM - 06 May 21 UTC

asm582

enhancement

Hello @rkooo567 , Consider a scenario for which large numbers of tasks with s…ome outliers are present. We want the ability in ray in the form of python decorators to time-out individual tasks. Such functionality is provided in spark here: https://stackoverflow.com/questions/63389263/how-to-set-timeout-to-a-spark-task-or-map-operation-or-skip-long-running-task Here is an example use case: The below list user_task consists of normal_task and long_task : user_task [normal_task, normal_task, normal_task, long_task, normal_task, normal_task] where normal_tasks are tasks that run within time-out and long_task is a task that runs for a very long time termed as an outlier. we looked at ray.wait and ray.cancel APIs it does not have the capability to set time-out at the individual task level, in other words, we want to provide an opportunity to run all the tasks in the above user_tasks array and time-out outlier tasks. Please let us know if you need any further details.

Topic		Replies	Views
JobSubmissionClient and Actor Usage Issues Ray Core	4	512	January 20, 2024
[Core] Programmatic way to access pending tasks for an actor? Ray Core	6	398	June 23, 2021
How to get the running task of a synchronous, single-threaded actor Ray Core	1	291	February 16, 2023
Using ray for submitting async tasks from a FastAPI backend Ray Core	2	325	April 22, 2022
I want to check task is completed or not based on some id Ray Core	0	111	April 2, 2024

Monitor state of submitted Tasks

Related topics