1. Severity of the issue: (select one)
I’m just curious or want clarification.
2. Environment:
- Ray version: 2.49.2
- Python version:3.9.23
- OS:Linux
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
-
Expected: I am following along the RLLib getting started doc here. This is for the PPO algo training on Pendulum-v1 environment using ray_tune with 3 learning rates (gridsearch) example in the link provided. I expected that the output in the terminal will refect what is shown in the link. I get this below
-
Actual: I expect the output to be
Trial status: 3 RUNNING
Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
Trial status: 3 RUNNING
Current time: 2025-01-17 18:47:33. Total running time: 3min 0s
Logical resource usage: 9.0/12 CPUs, 0/0 GPUs
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status lr iter total time (s) episode_return_mean .._sampled_lifetime │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PPO_Pendulum-v1_b5c41_00000 RUNNING 0.001 29 86.2426 -998.449 108000 │
│ PPO_Pendulum-v1_b5c41_00001 RUNNING 0.0005 25 74.4335 -997.079 100000 │
│ PPO_Pendulum-v1_b5c41_00002 RUNNING 0.0001 20 60.0421 -960.293 80000 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
`
If you notice the 6th column, there is a clear difference where in the tutorial provided one gets episode_return_mean whereas I get somethign …lls_per_iteration (I don’t know what I am getting either. Would appreciate any help ( This is my first time using the tool - so apologies for any stupid questions ). Apologies about copying terminal output. Seems I cannot add two images due to being a new user. Thanks
