Minimum requirement of offline data for MARWIL

jdong9604 · September 11, 2024, 9:23am

Hello, I’m very new at ray rllib.
Now, I’m doing my personal project about offline RL.
What I’m planning is extracting offline dataset from global optimization solvers and let MARWIL imitates its policy.

After check below MARWIL tuned cartpole example, and I found that the example’s offline data is consisted of eps_id, obs, actions, rewards, and also action_prob, action distribution inputs, and value targets in every episode.

My question is that for implementing MARWIL, those policy information (action_prob, distribution inputs…) is essential??
If not, what is the purpose of these data in this tuned example?

Topic		Replies	Views
RNN support for offline algorithms RLlib	5	711	February 1, 2022
Offline learning with MARWIL with LSTM RLlib	3	406	December 9, 2021
Offline RL with DQN, PPO, etc Offline RL	0	322	November 5, 2023
Error when training RL policy using big offline dataset RLlib	5	755	October 7, 2022
Offline data example Offline RL	4	667	April 14, 2023

Minimum requirement of offline data for MARWIL

Related topics