I thought this was solved, but itâs not.
I did the same steps I outlined in my post from April 1, 2023, on a different machine and again seeing the âlocalhost refused to connectâ problem in the Cluster Utilization and Node Count windows.
I am running Prometheus and Grafana from a Homebrew install. They are both working.
% brew services
Name Status User File
grafana started mcneill ~/Library/LaunchAgents/homebrew.mxcl.grafana.plist
prometheus started mcneill ~/Library/LaunchAgents/homebrew.mxcl.prometheus.plist
I can see web pages at http://localhost:9090 and http://localhost:3000.
The Prometheus log at /opt/homebrew/var/log/prometheus.err.log
shows that /tmp/ray/session_latest/metrics/prometheus/prometheus.yml
has been loaded.
ts=2024-01-14T01:47:20.741Z caller=main.go:539 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-01-14T01:47:20.741Z caller=main.go:583 level=info msg="Starting Prometheus Server" mode=server version="(version=2.48.1, branch=non-git, revision=non-git)"
ts=2024-01-14T01:47:20.741Z caller=main.go:588 level=info build_context="(go=go1.21.5, platform=darwin/arm64, user=brew@Sonoma-arm64.local, date=20231208-09:22:46, tags=netgo,builtinassets,stringlabels)"
ts=2024-01-14T01:47:20.741Z caller=main.go:589 level=info host_details=(darwin)
ts=2024-01-14T01:47:20.741Z caller=main.go:590 level=info fd_limits="(soft=61440, hard=unlimited)"
ts=2024-01-14T01:47:20.741Z caller=main.go:591 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-01-14T01:47:20.743Z caller=web.go:566 level=info component=web msg="Start listening for connections" address=127.0.0.1:9090
ts=2024-01-14T01:47:20.743Z caller=main.go:1024 level=info msg="Starting TSDB ..."
ts=2024-01-14T01:47:20.743Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1705177341653 maxt=1705183200000 ulid=01HM2TA7KYTCCA3NT2A39Z3GM4
ts=2024-01-14T01:47:20.743Z caller=repair.go:56 level=info component=tsdb msg="Found healthy block" mint=1705185811653 maxt=1705190400000 ulid=01HM2TA7XPS25VJ6W7K6MZ7QCN
ts=2024-01-14T01:47:20.743Z caller=tls_config.go:274 level=info component=web msg="Listening on" address=127.0.0.1:9090
ts=2024-01-14T01:47:20.743Z caller=tls_config.go:277 level=info component=web msg="TLS is disabled." http2=false address=127.0.0.1:9090
ts=2024-01-14T01:47:20.745Z caller=head.go:601 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2024-01-14T01:47:20.746Z caller=head.go:682 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=385.208”s
ts=2024-01-14T01:47:20.746Z caller=head.go:690 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2024-01-14T01:47:20.753Z caller=head.go:726 level=info component=tsdb msg="WAL checkpoint loaded"
ts=2024-01-14T01:47:20.754Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=68 maxSegment=72
ts=2024-01-14T01:47:20.755Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=69 maxSegment=72
ts=2024-01-14T01:47:20.767Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=70 maxSegment=72
ts=2024-01-14T01:47:20.767Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=71 maxSegment=72
ts=2024-01-14T01:47:20.767Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=72 maxSegment=72
ts=2024-01-14T01:47:20.767Z caller=head.go:798 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=7.654417ms wal_replay_duration=13.646042ms wbl_replay_duration=42ns total_replay_duration=21.713917ms
ts=2024-01-14T01:47:20.769Z caller=main.go:1045 level=info fs_type=1a
ts=2024-01-14T01:47:20.769Z caller=main.go:1048 level=info msg="TSDB started"
ts=2024-01-14T01:47:20.769Z caller=main.go:1230 level=info msg="Loading configuration file" filename=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml
ts=2024-01-14T01:47:20.787Z caller=main.go:1267 level=info msg="Completed loading of configuration file" filename=/tmp/ray/session_latest/metrics/prometheus/prometheus.yml totalDuration=17.422708ms db_storage=625ns remote_storage=791ns web_handler=208ns query_engine=458ns scrape=17.257833ms scrape_sd=27.292”s notify=791ns notify_sd=1.125”s rules=1.459”s tracing=7.208”s
ts=2024-01-14T01:47:20.787Z caller=main.go:1009 level=info msg="Server is ready to receive web requests."
ts=2024-01-14T01:47:20.787Z caller=manager.go:1012 level=info component="rule manager" msg="Starting rule manager..."
My /usr/local/etc/prometheus.args
file looks like this:
--config.file /tmp/ray/session_latest/metrics/prometheus/prometheus.yml
My /tmp/ray/session_latest/metrics/prometheus/prometheus.yml
file looks like this:
# my global config
global:
scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute.
evaluation_interval: 10s # Evaluate rules every 10 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
scrape_configs:
# Scrape from each ray node as defined in the service_discovery.json provided by ray.
- job_name: 'ray'
file_sd_configs:
- files:
- '/tmp/ray/prom_metrics_service_discovery.json'
My /usr/local/etc/grafana/grafana.ini
file looks like this:
[security]
allow_embedding = true
[auth.anonymous]
enabled = true
org_name = Main Org.
org_role = Viewer
[paths]
provisioning = /tmp/ray/session_latest/metrics/grafana/provisioning
What am I doing wrong? Is there anything I can do to debug this. Any error messages anywhere?