- High: It blocks me to complete my task.
I want to be able to see how much memory/cpu a given actor/task is currently using (and possibly log this data/do certain application/scheduling decisions based on it). I would also like to programatically track shared obj store usage. Is there a python API for this?
cc: @sangcho @rickyyx @ericl
btw, can someone move this post under “Monitoring and Debugging” category?
Unfortunately, we currently don’t support cpu/memory usage per actor/task, but this is something we are looking into. One of the blockers is the cardinality of such data given the number of tasks/actor could be rather large in Ray.
I would also like to programatically track shared obj store usage. Is there a python API for this?
AFAIK, you could probably do the below (kind of hacky unfortunately):
- For a cluster level resource usage, you could probably parse the obj store usage from autoscaler’s status. See example query usage from
ray status here
- Or if you have prometheus set up, you could also scrape the
ray_object_store_memory programmatically metric
If you could share a bit more on your usecase, that would be great. We are actively working on the resources observability in the coming releases so knowing the usecases would help us prioritize
Also this is the documentation regarding how to setup prometheus metrics! Metrics — Ray 3.0.0.dev0.
We recommend you to use Ray 2.1+ to use this feature.