Sample Rule-Based Expert Demonstrations in Rllib

mickelliu · July 29, 2021, 5:52am

Hello there,

I made a custom environment (with Gym API) and I was able to use Rllib for training agents in this environment. I wrote a rule-based "expert " that does not utilize a neural network to sample its actions. I wish to sample trajectories from this “expert” then warm-start my RL agents using imitation learning based on the trajectories generated from this expert.

To this end, I probably need to build an offline dataset to do imitation learning (specifically using BC and MRWIL algorithms available in Rllib). How can I just do data sampling using Rllib without training any agent? Since I saw most of the example scripts use some sort of trainers. Do I need to instantiate a TFPolicy or TorchPolicy and fit my rule-based module into the policy (e.g. put the action-sampling rules in the “forward” function rather than using a NN)?

Thanks in advance.

Lars_Simon_Zehnder · July 29, 2021, 3:00pm

Hi @mickelliu and welcome,

so to sample data from the expert interactions with your environment I would write a custom policy as shown in the documentation.

Then to produce an offline dataset for training you need to use the offline dataset API. Basically, you create from your Custom Policy a Trainer:

from ray.rllib.agents.trainer_template import build_trainer

MyTrainer = build_trainer(
    name="MyExpertPolicy",
    default_policy=MyExpertPolicy)

and with this Trainer you train() your policy (of course it does not train as it is deterministic) and store the data to path/to/folder/where/I/want/my/data:

config = {
   "env": MyEnv, # class name of your gym environment
   "output": "path/to/folder/where/I/want/my/data",
   "framework": None,
}
ray.init(ignore_reinit_error=True)
mytrainer = MyTrainer(config=config)
results = mytrainer.train()

This data (it’s in JSON format) can then be used for training another policy. Hope this helps.

Simon

Lars_Simon_Zehnder · July 29, 2021, 6:03pm

Also, maybe this thread helps you.

mickelliu · July 30, 2021, 5:44am

Thanks, @Lars_Simon_Zehnder for your quick response.

Yestersday I figured out a way to do it by defining a custom trainable and policy then manually roll out the workers and run them with tune.run(). But it looks that the trainer template is much cleaner to write.

sven1977 · August 3, 2021, 6:30pm

Hey @mickelliu , you can then also take a look at the BC/MARWIL test cases here:
ray.rllib.agents.marwil.tests.test_[bc|marwil].py

or at this example script:
ray.rllib.examples.offline_rl.py

that show how you hook up your dataset and use it to do offline training.

fksvensson · January 20, 2023, 2:17pm

hello @Lars_Simon_Zehnder!

I am also trying to write a rule based policy for data collection purposes, but from ray.rllib.agents.trainer_template import build_traine seems to be deprecated in ray 2.0.0. Do you happen to know what has replaced it?

I am trying to register it as an algoirhtm but there seems too be factors missing in the cutstom policy when i do that

Lars_Simon_Zehnder · January 24, 2023, 7:11pm

Hi @fksvensson, I had to search a little and roll back to ray 1.6.0 to find the trainer_template.py . In Ray 2.0.0 this has been changed to class inheritance. You can (and you should) directly inherit from the Algorithm class which is a tune Trainable. In there you can define your own training steps. If you take a look at the RandomAgent you see how this is for example done for a rule based agent that is not trained. Notice, it overrides the step() function and therein defines the actions for the agent with the environment that is provided in the config you define when setting up an RLlib algorithm.

Let us know, if this worked out

Topic		Replies	Views
Implementing Jump Start Reinforcement Learning in RLLib RLlib	8	1200	May 27, 2022
[RLlib] Make it easier to play trained policies RLlib	2	775	June 3, 2021
Offline data with self made dataset RLlib	1	273	June 7, 2023
Behavior Cloning through custom env RLlib	4	511	August 13, 2021
Ensemble Learner with rule-based policies RLlib	1	360	January 12, 2022

Sample Rule-Based Expert Demonstrations in Rllib

Related topics