Sample Rule-Based Expert Demonstrations in Rllib

Hello there,

I made a custom environment (with Gym API) and I was able to use Rllib for training agents in this environment. I wrote a rule-based "expert " that does not utilize a neural network to sample its actions. I wish to sample trajectories from this “expert” then warm-start my RL agents using imitation learning based on the trajectories generated from this expert.

To this end, I probably need to build an offline dataset to do imitation learning (specifically using BC and MRWIL algorithms available in Rllib). How can I just do data sampling using Rllib without training any agent? Since I saw most of the example scripts use some sort of trainers. Do I need to instantiate a TFPolicy or TorchPolicy and fit my rule-based module into the policy (e.g. put the action-sampling rules in the “forward” function rather than using a NN)?

Thanks in advance.

Hi @mickelliu and welcome,

so to sample data from the expert interactions with your environment I would write a custom policy as shown in the documentation.

Then to produce an offline dataset for training you need to use the offline dataset API. Basically, you create from your Custom Policy a Trainer:

from ray.rllib.agents.trainer_template import build_trainer

MyTrainer = build_trainer(

and with this Trainer you train() your policy (of course it does not train as it is deterministic) and store the data to path/to/folder/where/I/want/my/data:

config = {
   "env": MyEnv, # class name of your gym environment
   "output": "path/to/folder/where/I/want/my/data",
   "framework": None,
mytrainer = MyTrainer(config=config)
results = mytrainer.train()

This data (it’s in JSON format) can then be used for training another policy. Hope this helps.


1 Like

Also, maybe this thread helps you.

1 Like

Thanks, @Lars_Simon_Zehnder for your quick response.

Yestersday I figured out a way to do it by defining a custom trainable and policy then manually roll out the workers and run them with But it looks that the trainer template is much cleaner to write.

1 Like

Hey @mickelliu , you can then also take a look at the BC/MARWIL test cases here:

or at this example script:

that show how you hook up your dataset and use it to do offline training.

1 Like