Hybrid Offline learning and PPO?

Hi all,

I’ve been looking around and I’m now wondering if it would make sense to combine offline rl with PPO (or another on-line rl algorithm)?

I ask because in my application it is posible to have some historical data of trajectories for particular examples as well an appropiate simulation environment for on-line rl. I was thinking in sort of “warm start” the online algorithm with expert knowledge, let say.

If the above if possible, what could be a sort of “best practice” to do so? Any direction indication would be very appreciated.

If not, what would be the way to go? Any sugestion?

\mario

1 Like

Yes, it does make sense.
Have a look at this example which uses our new RL Modules API:

1 Like

Hi! I’ve tried this example and found that PPO training will lead to a drop instead of increment on the performance (But still better than from scratch, so the model is loaded). The more episodes BC pretrained, the more drop it will be. I wonder if this is expected?