I made a custom environment (with Gym API) and I was able to use Rllib for training agents in this environment. I wrote a rule-based "expert " that does not utilize a neural network to sample its actions. I wish to sample trajectories from this “expert” then warm-start my RL agents using imitation learning based on the trajectories generated from this expert.
To this end, I probably need to build an offline dataset to do imitation learning (specifically using BC and MRWIL algorithms available in Rllib). How can I just do data sampling using Rllib without training any agent? Since I saw most of the example scripts use some sort of trainers. Do I need to instantiate a TFPolicy or TorchPolicy and fit my rule-based module into the policy (e.g. put the action-sampling rules in the “forward” function rather than using a NN)?
Thanks in advance.