Hello there,
I made a custom environment (with Gym API) and I was able to use Rllib for training agents in this environment. I wrote a rule-based "expert " that does not utilize a neural network to sample its actions. I wish to sample trajectories from this “expert” then warm-start my RL agents using imitation learning based on the trajectories generated from this expert.
To this end, I probably need to build an offline dataset to do imitation learning (specifically using BC and MRWIL algorithms available in Rllib). How can I just do data sampling using Rllib without training any agent? Since I saw most of the example scripts use some sort of trainers. Do I need to instantiate a TFPolicy or TorchPolicy and fit my rule-based module into the policy (e.g. put the action-sampling rules in the “forward” function rather than using a NN)?
Thanks in advance.
Hi @mickelliu and welcome,
so to sample data from the expert interactions with your environment I would write a custom policy as shown in the documentation.
Then to produce an offline dataset for training you need to use the offline dataset API. Basically, you create from your Custom Policy a Trainer
:
from ray.rllib.agents.trainer_template import build_trainer
MyTrainer = build_trainer(
name="MyExpertPolicy",
default_policy=MyExpertPolicy)
and with this Trainer
you train()
your policy (of course it does not train as it is deterministic) and store the data to path/to/folder/where/I/want/my/data
:
config = {
"env": MyEnv, # class name of your gym environment
"output": "path/to/folder/where/I/want/my/data",
"framework": None,
}
ray.init(ignore_reinit_error=True)
mytrainer = MyTrainer(config=config)
results = mytrainer.train()
This data (it’s in JSON format) can then be used for training another policy. Hope this helps.
Simon
1 Like
Also, maybe this thread helps you.
1 Like
Thanks, @Lars_Simon_Zehnder for your quick response.
Yestersday I figured out a way to do it by defining a custom trainable and policy then manually roll out the workers and run them with tune.run(). But it looks that the trainer template is much cleaner to write.
1 Like
Hey @mickelliu , you can then also take a look at the BC/MARWIL test cases here:
ray.rllib.agents.marwil.tests.test_[bc|marwil].py
or at this example script:
ray.rllib.examples.offline_rl.py
that show how you hook up your dataset and use it to do offline training.
1 Like
hello @Lars_Simon_Zehnder!
I am also trying to write a rule based policy for data collection purposes, but from ray.rllib.agents.trainer_template import build_traine
seems to be deprecated in ray 2.0.0. Do you happen to know what has replaced it?
I am trying to register it as an algoirhtm but there seems too be factors missing in the cutstom policy when i do that
Hi @fksvensson, I had to search a little and roll back to ray 1.6.0 to find the trainer_template.py
. In Ray 2.0.0 this has been changed to class inheritance. You can (and you should) directly inherit from the Algorithm
class which is a tune
Trainable
. In there you can define your own training steps. If you take a look at the RandomAgent
you see how this is for example done for a rule based agent that is not trained. Notice, it overrides the step()
function and therein defines the actions for the agent with the environment that is provided in the config
you define when setting up an RLlib algorithm.
Let us know, if this worked out