Getting started rllib - deprecated?

kausiksivakumar · October 16, 2025, 9:49pm

1. Severity of the issue: (select one)
I’m just curious or want clarification.
2. Environment:

Ray version: 2.49.2
Python version:3.9.23
OS:Linux
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:

Expected: I am following along the RLLib getting started doc here. This is for the PPO algo training on Pendulum-v1 environment using ray_tune with 3 learning rates (gridsearch) example in the link provided. I expected that the output in the terminal will refect what is shown in the link. I get this below

image1060×121 9.78 KB
Actual: I expect the output to be
Trial status: 3 RUNNING
Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
Trial status: 3 RUNNING
Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status lr iter total time (s) episode_return_mean .._sampled_lifetime │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PPO_Pendulum-v1_b5c41_00000 RUNNING 0.001 29 86.2426 -998.449 108000 │
│ PPO_Pendulum-v1_b5c41_00001 RUNNING 0.0005 25 74.4335 -997.079 100000 │
│ PPO_Pendulum-v1_b5c41_00002 RUNNING 0.0001 20 60.0421 -960.293 80000 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
`

If you notice the 6th column, there is a clear difference where in the tutorial provided one gets episode_return_mean whereas I get somethign …lls_per_iteration (I don’t know what I am getting either. Would appreciate any help ( This is my first time using the tool - so apologies for any stupid questions ). Apologies about copying terminal output. Seems I cannot add two images due to being a new user. Thanks

RunLLM · October 16, 2025, 9:50pm

The difference in your output table columns (e.g., seeing lls_per_iteration instead of episode_return_mean) is due to changes in Ray RLlib’s result reporting and default terminal output formatting in recent versions. In Ray 2.x, especially after 2.9.0, key metrics like episode_return_mean are now nested under env_runners, so the correct metric is env_runners/episode_return_mean, not episode_return_mean at the top level. The terminal output may not show this by default, but the metric is still available in the results object and for use in stopping criteria or analysis (see docs, example).

To access the mean episode return, use result[“env_runners”][“episode_return_mean”] in your code, or set metric=“env_runners/episode_return_mean” in your Tune config. The terminal table output is just a summary and may not display all available metrics, but the underlying data is present (see example code). Would you like a step-by-step guide on how to extract or display this metric in your workflow?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Unable to replicate original PPO performance RLlib	0	208	May 10, 2024
A little help for a novice RLlib	1	456	October 26, 2022
Impala Bugs and some other observations RLlib	9	1133	April 27, 2023
Converstion to Ray 2.0 RLlib	1	299	October 25, 2022
RLLIB not working with Tune with sample batch input RLlib	25	2689	October 4, 2022

Getting started rllib - deprecated?

Related topics