(pid=gcs_server) and (raylet) report : Failed to establish connection to the metrics exporter agent

1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.

2. Environment:

  • Ray version: ray, version 2.52.1
  • Python version: Python 3.11.14
  • OS: Win11
  • Cloud/Infrastructure: None
  • Other libs/tools (if relevant): PyTorch, Lightning, WanDB

3. What happened vs. what you expected:

  • Expected: No error on rpc
  • Actual: Errors on rpc.

I keep seeing:

(pid=gcs_server) \[2025-12-10 10:32:21,503 E 52384 54472\] (gcs_server.exe) gcs_server.cc:303: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(raylet) \[2025-12-10 10:32:25,037 E 94496 93984\] (raylet.exe) main.cc:979: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14

in my training process.

I tried to:

if ray.is_initialized():
    log.warning("Ray was already initialized. Shutting down to apply new configuration...")
    ray.shutdown()

if not ray.is_initialized():
    log.info("Initializing Ray with dashboard disabled to prevent connection errors...")
    ray.init(include_dashboard=False, ignore_reinit_error=True)

also tried to:

 $env:RAY_include_dashboard = 0   

to slient it since it looks like something to do with dashboard. But it dose not work.

It is not preventing me from tunning, but I really want to know what went wrong.:sob:

The error “Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14” means Ray’s GCS and raylet processes could not connect to the metrics exporter (dashboard agent), so metrics/events export is disabled. This is often due to the dashboard agent failing to start, port conflicts, or issues with dependencies like grpcio, especially on Windows. Disabling the dashboard via include_dashboard=False or environment variables does not always prevent Ray from attempting to start the agent, and the error is informational—your training/tuning will still work, but metrics and dashboard features are unavailable. See similar issues and explanations in Ray GitHub issue #52739 and Ray GitHub issue #54587.

If you want to suppress these errors, ensure your grpcio version matches Ray’s requirements, and check for port conflicts or missing dependencies. However, on Windows, this is a known limitation and may not be fully avoidable in current Ray versions. The error does not affect core Ray functionality, only observability features. Would you like more detail on troubleshooting or suppressing these messages?

Sources:

Hint: Mention @RunLLM in the post for followups.