Offline data tutorial sub-performs

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.45.0
  • Python version: 3.12.10
  • OS: linux
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant): torch 2.8.0.dev20250414+cu128

3. What happened vs. what you expected:

  • Expected: Trained algo restored from checkpoint should perform well
  • Actual: Trained algo performs badly

Hi !

I am trying to follow along the tutorial Working with offline data — Ray 2.45.0. I have succesfully run that step (training-an-expert-policy) to train an algo and verified the algo is indeed trained.

(PPO(env=CartPole-v1; env-runners=2; learners=0; multi-agent=False) pid=3062681) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/xxxx/ray_results/docs_rllib_offline_pretrain_ppo/PPO_CartPole-v1_a12cf_00000_0_2025-05-07_13-30-08/checkpoint_000025)

Trial PPO_CartPole-v1_a12cf_00000 finished iteration 27 at 2025-05-07 13:30:39. Total running time: 31s
╭─────────────────────────────────────────────────────╮
│ Trial PPO_CartPole-v1_a12cf_00000 result                                                                      │
├─────────────────────────────────────────────────────┤
│ env_runners/episode_len_mean                 459.26
│ env_runners/episode_return_mean              459.26
│ num_env_steps_sampled_lifetime               108000
╰─────────────────────────────────────────────────────╯

But when I reload this checkpoint to perform step2, Record expert data to local disk I am getting suboptimal results.
'episode_return_max': 36.0
'agent_episode_returns_mean': {'default_agent': 16.0}
Those numbers are more or less the same for all 10 iterations. So when I go to step 3 (behavioral cloning) of course, the newly trained algo subperforms as well.

Any idea what could be wrong at that step2 (Record expert data to local disk) ?

Hi Iamgroot,
Can you please post your PPOConfig here if you can? There are a few different reasons that I can think of why this might be happening but let me know if you are using the same one in the tutorial that you linked.