Monitoring Ray States

  • High: It blocks me to complete my task.

Ray 2.0 introduced a wonderful set of Monitoring Ray States APIs Monitoring Ray States β€” Ray 2.0.0. Unfortunately it only seems to support local Ray clusters. Any plans to add support for remote clusters?

cc: @rickyyx @sangcho for thoughts

What do you mean by the local / remote cluster? Are you saying how to use this when you are outside the cluster (e.g., your laptop β†’ cluster)? I think it should just work because it uses the same mechanism as the ray job submission (which definitely should work to the remote cluster). Can you tell me more details about it is not working in a remote cluster?

It does not. It does not connect to remote cluster, and docs in the API are explicitly saying local

Oh are you saying the Python API for ray client?

No, the ones described here Monitoring Ray States β€” Ray 2.0.0
more specifically, the ones from ray.experimental.state.api package

@blublinsky
What’s your current setup?

  • Could you describe how you start the cluster? What kind of remote clusters are you using? VMs? K8s? Are you using KubeRay?
  • How do you connect to the remote cluster? Ray client? SSH?
  • How do you use the ray state api/cli?

Could you describe how you start the cluster? What kind of remote clusters are you using? VMs? K8s? Are you using KubeRay?
Starting it on VMs in AWS using Ray up
How do you connect to the remote cluster? Ray client? SSH?
Ray client
How do you use the ray state api/cli?
Using Ray APIs, code from Monitoring Ray States β€” Ray 2.0.0 directly

What are the errors/exceptions you got when trying to use ray state?

Maybe try the solution here: Ray observability Β· Issue #28429 Β· ray-project/ray Β· GitHub and see if it works?

We are using:

    ulimit -n 65536; ray start --head --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0
    --port=6379
    --ray-client-server-port=10001
    --dashboard-port=8265
    --min-worker-port=10002
    --max-worker-port=19999
    --metrics-export-port=8075
    --object-manager-port=8076
    --node-manager-port=8077
    --dashboard-agent-grpc-port=8078
    --dashboard-agent-listen-port=8079

Do we also need to add include?

And it does work, thank you

1 Like