No details in Object Store Memory

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Explanation

The plot on the “Object Store Memory” in the dashboard (using Grafana and in Ray’s dashboard) doesn’t provide the separation for IN_MEMORY, UNSEALED, SPILLED, MAX. It has instead only VALUE and MAX. However I see the separation in e.g. Scheduler Task State (where I can see SUBMITED_TO_WORKER, RUNNING …)

ray 2.2.0
Python 3.8.16

I am able to see the location of the object store memory in ray dashboard (ray 2.2)…

Do you have a repro you could share that leads to the screenshot you have?
Could you also take a screenshot of this metric in ray dashboard UI? (I believe the one you showed here is from grafana)

Hi, thanks for the reply. I got the same image on the ray dashboard.
Let me give you some more details:

  • I run it on Mac (I checked on both Inter and M1 version, the problem exist on both)
  • I installed grafana and prometheus via brew and overwrote the configs using the files generated by ray in /tmp/ray/latest_session/ (I shortened the refresh rates for the example below, but that shouldn’t matter)
  • I have ray[default] installed.

Here is a short code that reproduces the issue:

import ray
import numpy as np
import time


@ray.remote
def f1(a):
    time.sleep(10)
    return a


if __name__ == "__main__":
    ray.init()
    sth_in_memory_id = ray.put(np.ones((1_000_000, 1)))
    f1_id = f1.remote(np.zeros((1_000_000, 1)))
    f1_res = ray.get(f1_id)
    input("Type enter when done")
    ray.shutdown()


Let me know if I can upload any logs that might help (and where to find them).

cc: @sangcho @rickyyx @aguo for ideas

Hey @adam could you copy paste the grafana query of this chart? If you go to the grafana dashboard and click on editing the tab specifically?

1 Like

Hi, I don’t think I can open the dashboard in edit mode. But I can download the dashboard.
I cannot enclose a json file. I’ll paste it in the next messages.

{
“__inputs”: [
{
“name”: “DS_PROMETHEUS”,
“label”: “Prometheus”,
“description”: “”,
“type”: “datasource”,
“pluginId”: “prometheus”,
“pluginName”: “Prometheus”
}
],
“__elements”: {},
“__requires”: [
{
“type”: “grafana”,
“id”: “grafana”,
“name”: “Grafana”,
“version”: “9.3.2”
},
{
“type”: “panel”,
“id”: “graph”,
“name”: “Graph (old)”,
“version”: “”
},
{
“type”: “datasource”,
“id”: “prometheus”,
“name”: “Prometheus”,
“version”: “1.0.0”
}
],
“annotations”: {
“list”: [
{
“builtIn”: 1,
“datasource”: {
“type”: “datasource”,
“uid”: “grafana”
},
“enable”: true,
“hide”: true,
“iconColor”: “rgba(0, 211, 255, 1)”,
“name”: “Annotations & Alerts”,
“target”: {
“limit”: 100,
“matchAny”: false,
“tags”: ,
“type”: “dashboard”
},
“type”: “dashboard”
}
]
},
“editable”: true,
“fiscalYearStartMonth”: 0,
“graphTooltip”: 0,
“id”: null,
“links”: ,
“liveNow”: false,
“panels”: [
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of tasks in a particular state.\n\nState: the task state, as described by rpc::TaskState proto in common.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 0
},
“hiddenSeries”: false,
“id”: 26,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(max_over_time(ray_tasks{State=~"FINISHED|FAILED",SessionName="$SessionName"}[14d])) by (State) or clamp_min(sum(ray_tasks{State!~"FINISHED|FAILED",SessionName="$SessionName"}) by (State), 0)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Task State”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “tasks”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of (live) tasks with a particular name.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 0
},
“hiddenSeries”: false,
“id”: 35,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_tasks{State!~"FINISHED|FAILED",SessionName="$SessionName"}) by (Name)”,
“interval”: “”,
“legendFormat”: “{{Name}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Active Tasks by Name”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “tasks”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of actors in a particular state.\n\nState: the actor state, as described by rpc::ActorTableData proto in gcs.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 8
},
“hiddenSeries”: false,
“id”: 33,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_actors{SessionName="$SessionName"}) by (State)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Actor State”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “actors”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of (live) actors with a particular name.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 8
},
“hiddenSeries”: false,
“id”: 36,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_actors{State!="DEAD",SessionName="$SessionName"}) by (Name)”,
“interval”: “”,
“legendFormat”: “{{Name}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Active Actors by Name”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “actors”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Logical CPU usage of Ray. The dotted line indicates the total number of CPUs. The logical CPU is allocated by num_cpus arguments from tasks and actors.\n\nNOTE: Ray’s logical CPU is different from physical CPU usage. Ray’s logical CPU is allocated by num_cpus arguments.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 16
},
“hiddenSeries”: false,
“id”: 27,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="CPU",State="USED",SessionName="$SessionName"}) by (instance)”,
“interval”: “”,
“legendFormat”: “CPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="CPU",SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler CPUs (logical slots)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “cores”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Object store memory usage by location. The dotted line indicates the object store memory capacity.\n\nLocation: where the memory was allocated, which is MMAP_SHM or MMAP_DISK to indicate memory-mapped page, SPILLED to indicate spillage to disk, and WORKER_HEAP for objects small enough to be inlined in worker memory. Refer to metric_defs.cc for more information.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 16
},
“hiddenSeries”: false,
“id”: 29,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_object_store_memory{SessionName="$SessionName"} / 1e9) by (Location)”,
“interval”: “”,
“legendFormat”: “{{Location}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="object_store_memory",SessionName="$SessionName"} / 1e9)”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Object Store Memory”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “gbytes”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: "Logical GPU usage of Ray. The dotted line indicates the total number of GPUs. The logical GPU is allocated by num_gpus arguments from tasks and actors. ",
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 24
},
“hiddenSeries”: false,
“id”: 28,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “ray_resources{Name="GPU",State="USED",SessionName="$SessionName"}”,
“interval”: “”,
“legendFormat”: “GPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="GPU",SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler GPUs (logical slots)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “GPUs”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of placement groups in a particular state.\n\nState: the placement group state, as described by the rpc::PlacementGroupTable proto in gcs.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 24
},
“hiddenSeries”: false,
“id”: 40,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_placement_groups{SessionName="$SessionName"}) by (State)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Placement Groups”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “placement groups”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 32
},
“hiddenSeries”: false,
“id”: 2,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “7.5.17”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “ray_node_cpu_utilization{instance=~"$Instance",SessionName="$SessionName"} * ray_node_cpu_count{instance=~"$Instance",SessionName="$SessionName"} / 100”,
“interval”: “”,
“legendFormat”: “CPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_node_cpu_count{SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Node CPU (hardware utilization)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “cores”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: "Node’s physical (hardware) GPU usage. The dotted line means the total number of hardware GPUs from the cluster. ",
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 32
},
“hiddenSeries”: false,
“id”: 8,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “7.5.17”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “ray_node_gpus_utilization{instance=~"$Instance",SessionName="$SessionName"} / 100”,
“interval”: “”,
“legendFormat”: “GPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_node_gpus_available{SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Node GPU (hardware utilization)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “GPUs”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: "Node’s physical (hardware) disk usage. The dotted line means the total amount of disk space from the cluster.\n\nNOTE: When Ray is deployed within a container, this shows the disk usage from the host machine. ",
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 40
},
“hiddenSeries”: false,
“id”: 6,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},

{
“__inputs”: [
{
“name”: “DS_PROMETHEUS”,
“label”: “Prometheus”,
“description”: “”,
“type”: “datasource”,
“pluginId”: “prometheus”,
“pluginName”: “Prometheus”
}
],
“__elements”: {},
“__requires”: [
{
“type”: “grafana”,
“id”: “grafana”,
“name”: “Grafana”,
“version”: “9.3.2”
},
{
“type”: “panel”,
“id”: “graph”,
“name”: “Graph (old)”,
“version”: “”
},
{
“type”: “datasource”,
“id”: “prometheus”,
“name”: “Prometheus”,
“version”: “1.0.0”
}
],
“annotations”: {
“list”: [
{
“builtIn”: 1,
“datasource”: {
“type”: “datasource”,
“uid”: “grafana”
},
“enable”: true,
“hide”: true,
“iconColor”: “rgba(0, 211, 255, 1)”,
“name”: “Annotations & Alerts”,
“target”: {
“limit”: 100,
“matchAny”: false,
“tags”: ,
“type”: “dashboard”
},
“type”: “dashboard”
}
]
},
“editable”: true,
“fiscalYearStartMonth”: 0,
“graphTooltip”: 0,
“id”: null,
“links”: ,
“liveNow”: false,
“panels”: [
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of tasks in a particular state.\n\nState: the task state, as described by rpc::TaskState proto in common.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 0
},
“hiddenSeries”: false,
“id”: 26,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(max_over_time(ray_tasks{State=~"FINISHED|FAILED",SessionName="$SessionName"}[14d])) by (State) or clamp_min(sum(ray_tasks{State!~"FINISHED|FAILED",SessionName="$SessionName"}) by (State), 0)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Task State”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “tasks”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of (live) tasks with a particular name.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 0
},
“hiddenSeries”: false,
“id”: 35,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_tasks{State!~"FINISHED|FAILED",SessionName="$SessionName"}) by (Name)”,
“interval”: “”,
“legendFormat”: “{{Name}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Active Tasks by Name”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “tasks”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of actors in a particular state.\n\nState: the actor state, as described by rpc::ActorTableData proto in gcs.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 8
},
“hiddenSeries”: false,
“id”: 33,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_actors{SessionName="$SessionName"}) by (State)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Actor State”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “actors”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of (live) actors with a particular name.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 8
},
“hiddenSeries”: false,
“id”: 36,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_actors{State!="DEAD",SessionName="$SessionName"}) by (Name)”,
“interval”: “”,
“legendFormat”: “{{Name}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Active Actors by Name”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “actors”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Logical CPU usage of Ray. The dotted line indicates the total number of CPUs. The logical CPU is allocated by num_cpus arguments from tasks and actors.\n\nNOTE: Ray’s logical CPU is different from physical CPU usage. Ray’s logical CPU is allocated by num_cpus arguments.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 16
},
“hiddenSeries”: false,
“id”: 27,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="CPU",State="USED",SessionName="$SessionName"}) by (instance)”,
“interval”: “”,
“legendFormat”: “CPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="CPU",SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler CPUs (logical slots)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “cores”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Object store memory usage by location. The dotted line indicates the object store memory capacity.\n\nLocation: where the memory was allocated, which is MMAP_SHM or MMAP_DISK to indicate memory-mapped page, SPILLED to indicate spillage to disk, and WORKER_HEAP for objects small enough to be inlined in worker memory. Refer to metric_defs.cc for more information.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 16
},
“hiddenSeries”: false,
“id”: 29,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_object_store_memory{SessionName="$SessionName"} / 1e9) by (Location)”,
“interval”: “”,
“legendFormat”: “{{Location}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="object_store_memory",SessionName="$SessionName"} / 1e9)”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Object Store Memory”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “gbytes”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: "Logical GPU usage of Ray. The dotted line indicates the total number of GPUs. The logical GPU is allocated by num_gpus arguments from tasks and actors. ",
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 24
},
“hiddenSeries”: false,
“id”: 28,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “ray_resources{Name="GPU",State="USED",SessionName="$SessionName"}”,
“interval”: “”,
“legendFormat”: “GPU Usage: {{instance}}”,
“queryType”: “randomWalk”,
“refId”: “A”
},
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_resources{Name="GPU",SessionName="$SessionName"})”,
“interval”: “”,
“legendFormat”: “MAX”,
“queryType”: “randomWalk”,
“refId”: “B”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler GPUs (logical slots)”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “GPUs”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “Current number of placement groups in a particular state.\n\nState: the placement group state, as described by the rpc::PlacementGroupTable proto in gcs.proto.”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 12,
“y”: 24
},
“hiddenSeries”: false,
“id”: 40,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “9.3.2”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],
“spaceLength”: 10,
“stack”: true,
“steppedLine”: false,
“targets”: [
{
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“exemplar”: true,
“expr”: “sum(ray_placement_groups{SessionName="$SessionName"}) by (State)”,
“interval”: “”,
“legendFormat”: “{{State}}”,
“queryType”: “randomWalk”,
“refId”: “A”
}
],
“thresholds”: ,
“timeRegions”: ,
“title”: “Scheduler Placement Groups”,
“tooltip”: {
“shared”: true,
“sort”: 0,
“value_type”: “individual”
},
“type”: “graph”,
“xaxis”: {
“mode”: “time”,
“show”: true,
“values”:
},
“yaxes”: [
{
“$$hashKey”: “object:628”,
“format”: “placement groups”,
“label”: “”,
“logBase”: 1,
“min”: “0”,
“show”: true
},
{
“$$hashKey”: “object:629”,
“format”: “short”,
“logBase”: 1,
“show”: true
}
],
“yaxis”: {
“align”: false
}
},
{
“aliasColors”: {},
“bars”: false,
“dashLength”: 10,
“dashes”: false,
“datasource”: {
“type”: “prometheus”,
“uid”: “${DS_PROMETHEUS}”
},
“description”: “”,
“fill”: 10,
“fillGradient”: 0,
“gridPos”: {
“h”: 8,
“w”: 12,
“x”: 0,
“y”: 32
},
“hiddenSeries”: false,
“id”: 2,
“legend”: {
“alignAsTable”: true,
“avg”: false,
“current”: true,
“hideEmpty”: false,
“hideZero”: true,
“max”: false,
“min”: false,
“rightSide”: false,
“show”: true,
“sort”: “current”,
“sortDesc”: true,
“total”: false,
“values”: true
},
“lines”: true,
“linewidth”: 1,
“nullPointMode”: “null”,
“options”: {
“alertThreshold”: true
},
“percentage”: false,
“pluginVersion”: “7.5.17”,
“pointradius”: 2,
“points”: false,
“renderer”: “flot”,
“seriesOverrides”: [
{
“$$hashKey”: “object:2987”,
“alias”: “MAX”,
“color”: “#1F60C4”,
“dashes”: true,
“fill”: 0,
“stack”: false
},
{
“$$hashKey”: “object:78”,
“alias”: “/FINISHED|FAILED|DEAD|REMOVED/”,
“hiddenSeries”: true
}
],

Awesome, that will work.

Basically I just wanna make sure the query is correct.

Also, could you share your prometheus and grafana versions so we could repro as well? Thanks.

The messages with the dashboards were classified as spam. I’m sending a github’s gist of the Store Memory dashboard ray-dashboard-debug-Object Store Memory · GitHub
Versions:

  • grafana 9.3.2
  • prometheus 2.41.0
1 Like

Thanks @adam - will compare it with the config I have at my end ASAP - sorry for the delay.

Sorry for the delay.

I believe the issue was this in the config you pasted:

Could you try editing the query explicitly like this :