Hello there,
I made a custom environment (with Gym API) and I was able to use Rllib for training agents in this environment. I wrote a rule-based "expert " that does not utilize a neural network to sample its actions. I wish to sample trajectories from this “expert” then warm-start my RL agents using imitation learning based on the trajectories generated from this expert.
To this end, I probably need to build an offline dataset to do imitation learning (specifically using BC and MRWIL algorithms available in Rllib). How can I just do data sampling using Rllib without training any agent? Since I saw most of the example scripts use some sort of trainers. Do I need to instantiate a TFPolicy or TorchPolicy and fit my rule-based module into the policy (e.g. put the action-sampling rules in the “forward” function rather than using a NN)?
Thanks in advance.
Hi @mickelliu and welcome,
so to sample data from the expert interactions with your environment I would write a custom policy as shown in the documentation.
Then to produce an offline dataset for training you need to use the offline dataset API. Basically, you create from your Custom Policy a Trainer
:
from ray.rllib.agents.trainer_template import build_trainer
MyTrainer = build_trainer(
name="MyExpertPolicy",
default_policy=MyExpertPolicy)
and with this Trainer
you train()
your policy (of course it does not train as it is deterministic) and store the data to path/to/folder/where/I/want/my/data
:
config = {
"env": MyEnv, # class name of your gym environment
"output": "path/to/folder/where/I/want/my/data",
"framework": None,
}
ray.init(ignore_reinit_error=True)
mytrainer = MyTrainer(config=config)
results = mytrainer.train()
This data (it’s in JSON format) can then be used for training another policy. Hope this helps.
Simon
1 Like
Also, maybe this thread helps you.
1 Like
Thanks, @Lars_Simon_Zehnder for your quick response.
Yestersday I figured out a way to do it by defining a custom trainable and policy then manually roll out the workers and run them with tune.run(). But it looks that the trainer template is much cleaner to write.
1 Like
Hey @mickelliu , you can then also take a look at the BC/MARWIL test cases here:
ray.rllib.agents.marwil.tests.test_[bc|marwil].py
or at this example script:
ray.rllib.examples.offline_rl.py
that show how you hook up your dataset and use it to do offline training.
1 Like