How to collect the resources usage in job level?

hello,

When I use “ray job submit” to submit my job to a ray cluster, I ues prometheus to collect the default Metrics.

The cluster will collect the cpu and memory usage of nodes and component at the running time.

However, I can not select the cpu and memory usage of a specific tasks/job.
Can you tell that if the ray support to collect those usage infomation at job running time?

In other word, I want to collect those usage infomation of a job, how can I achive this?

If not, can I calculate a approximate cpu/memory usage infomation from other collected metrics (eg, actor or component metrics)?

Thank you so much!

Currently, this is not supported.

If not, can I calculate a approximate cpu/memory usage infomation from other collected metrics (eg, actor or component metrics)

I’m not sure. Will defer to @sangcho or @rickyyx on this.

One idea for achiving this:
create and collect application-level metrics

If not, can I calculate a approximate cpu/memory usage infomation from other collected metrics (eg, actor or component metrics)

The best way is to see Ray Dashboard — Ray 2.8.0 and use num_cpus and memory field! There’s no native support now