Ray Task count?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Not sure if this is my lack of understanding or a bug in the Ray.

The issue is that I am running mulitple Jobs and when I look into the details of each job the task count is cumulative. There is also unaccounted tasks which I don’t understand where it’s coming from. There cannot be that many un-submitted tasks.

Attaching jobs of two different Jobs. Each job should be around 2850 tasks.

@ckapoor Can you provide the minimal reproducible code for this issue? It’s a bit hard to investigate it without reproducing it.

cc: @rickyyx

@Huaiwei_Sun @rickyyx

I cannot provide the code but it seems to be re-producible when submitting many jobs. It is a bug in the dashboard UI. Note in the attached pic, the job succeeded but the task count still shows as running. In actual, this job did finish successfully and there are other jobs that are running whose cumulative count is likely around 52K.

Here is the current snapshot for the Jobs UI

I tried to reproduce but cannot. Here are my scripts:

ray job submit --working-dir . -- python job.py
import ray


def f(x):
    return x * x
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]

Can you try using jobs api to run this several times and see if it can reproduce the issue?

Or can you find other examples that could reproduce this issue? (don’t have to be your own code)

cc: @sangcho have you even seen this issue?

can you tell us how

  1. how many jobs did you run total?
  2. how many jobs do you run concurrently?
  3. how many tasks do you submit per job?

Also @rickyyx may know the issue. There’s a bit more improvement we need for this feature rn.

@sangcho @rickyyx

I am seeing this on every run I have. Here are answers to your questions. I am using the new dashboard UI.

We use ray job submit.

I see this issue when there are more than 5-10 concurrent jobs where each job could be anywhere between 2K-12K tasks. The task count looks good for first few hours and then gets jumbled up.

Also the little hover popup showing task status on the Jobs UI does not show beyond 10000 tasks.

10K is actually the limit of tasks w can display from the dashboard. We have 100K limit in the total number of tasks, and 10K limit in each job. Supporting more entries will require us to build pagination that hasn’t been prioritized.

For the unaccounted tasks problem, @rickyyx will tackle this in the next release. It’d be great if you guys can communicate and get some feedback.

@sangcho @rickyyx I think having the correct count beyond 10K/100K should be supported.

I am less interested in seeing the details of each task unless the task failed. Do you have some research on how much end users actually see the individual task details ?

One alternative could be to allow Filter on the status of the tasks with 10K limit on total. That way one can filter on the Failed and Running ones.

Yeah actually we are also planning to polish the GC policy to prioritize cleaning up finished tasks.

cc @rickyyx for more feedback. He will be leading this effort.