Getting started rllib - deprecated?

1. Severity of the issue: (select one)
I’m just curious or want clarification.
2. Environment:

  • Ray version: 2.49.2
  • Python version:3.9.23
  • OS:Linux
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: I am following along the RLLib getting started doc here. This is for the PPO algo training on Pendulum-v1 environment using ray_tune with 3 learning rates (gridsearch) example in the link provided. I expected that the output in the terminal will refect what is shown in the link. I get this below

  • Actual: I expect the output to be
    Trial status: 3 RUNNING
    Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
    Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
    Trial status: 3 RUNNING
    Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
    Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
    ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    │ Trial name status lr iter total time (s) episode_return_mean .._sampled_lifetime │
    ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
    │ PPO_Pendulum-v1_b5c41_00000 RUNNING 0.001 29 86.2426 -998.449 108000 │
    │ PPO_Pendulum-v1_b5c41_00001 RUNNING 0.0005 25 74.4335 -997.079 100000 │
    │ PPO_Pendulum-v1_b5c41_00002 RUNNING 0.0001 20 60.0421 -960.293 80000 │
    ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
    `

If you notice the 6th column, there is a clear difference where in the tutorial provided one gets episode_return_mean whereas I get somethign …lls_per_iteration (I don’t know what I am getting either. Would appreciate any help ( This is my first time using the tool - so apologies for any stupid questions ). Apologies about copying terminal output. Seems I cannot add two images due to being a new user. Thanks

The difference in your output table columns (e.g., seeing lls_per_iteration instead of episode_return_mean) is due to changes in Ray RLlib’s result reporting and default terminal output formatting in recent versions. In Ray 2.x, especially after 2.9.0, key metrics like episode_return_mean are now nested under env_runners, so the correct metric is env_runners/episode_return_mean, not episode_return_mean at the top level. The terminal output may not show this by default, but the metric is still available in the results object and for use in stopping criteria or analysis (see docs, example).

To access the mean episode return, use result[“env_runners”][“episode_return_mean”] in your code, or set metric=“env_runners/episode_return_mean” in your Tune config. The terminal table output is just a summary and may not display all available metrics, but the underlying data is present (see example code). Would you like a step-by-step guide on how to extract or display this metric in your workflow?

Sources:

Hint: Mention @RunLLM in the post for followups.