Implementing Jump Start Reinforcement Learning in RLLib

manjrekarom · April 30, 2022, 9:41pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Greetings,

I am applying the paper JSRL to visual navigation (pointnav) task using RLLib. I recently came across RLLib and it seems like a great tool. But I am new to it so could someone briefly explain to me how would I go about implementing this paper?

The paper explores fast learning using a prior policy in relatively difficult exploration/spare reward environments. I will describe the paper here in brief:

The paper uses two policies (i) a previously learned guide policy (a sub-optimal policy that knows what good states are) often learned using small amounts of (offline) data, and (ii) an exploration policy which learns via RL.
The goal is to enable fast learning of exploration policy given the guide policy. These policies are value-based (PPO, A2C, etc.) and thus naive initialization might not work (the paper shows experimental proof for this).
Training goes as follows. You first rollout guide policy and then in the same episode rollout using exploration policy for the remaining steps. Initially you rollout more with the guide policy (say after 90% of the timesteps are complete) but this amount will gradually decrease as exploration policy gets better in the course of training.

Another question related to my implementation I have is whether there is a utility in rllib or ray to collect and save data for offline use from an Env for PointNav task. Pointnav task observations consist of images that the agent sees (could be rgb+depth), gps and compass reading.

Thank you!

avnishn · May 4, 2022, 8:55pm

I’ve been thinking about implementing this in RLlib as well and, in the current way that RLlib works it would be somewhat difficult to do.

The issue is that we don’t have a direct interface by which you can specify a guide policy/guide data.
Are you planning on using a guide policy or guide data? You’d also have to define multiple samplers as well in this case, where the sampler mixes data from the guide policy. You could also use mix in replay to achieve this as well.

manjrekarom · May 5, 2022, 7:29am

Yes. I went through the tutorials over last couple of days and it feels somewhat harder to implement in RLLib. Could I formulate this as a multi-agent problem thereby allowing multiple policies (one guide and another exploration policy)?

About mixing up of replay buffers, I’ll need a way to properly sample the data in that case for training. For example sample batches by filtering with the policy that generated that data.

vishalrangras · May 23, 2022, 1:50pm

Hi @manjrekarom,

Not sure if this would help but maybe you can combine these concepts to achieve Jump-start RL.

https://docs.ray.io/en/latest/rllib/rllib-concepts.html#how-to-customize-policies

https://docs.ray.io/en/latest/rllib/rllib-offline.html

Hopefully, I will also start working on implementing JSRL in July 2022. It would be great to hear about your experiences and progress on this implementation in the meantime.

vishalrangras · May 23, 2022, 1:51pm

@sven1977 Is the implementation of JSRL in the pipeline for any upcoming release?

manjrekarom · May 24, 2022, 1:48am

Hi @vishalrangras !

Thanks for your reply. I’ll take a look at it.

mahuangxu · May 24, 2022, 2:21am

It seems like DQfD/POfD algorithm. I wonder if RLlib will provide these algorithms.

arturn · May 27, 2022, 3:13pm

For anyone visiting this thread:
There is an open feature request on our GH for this thanks to @mahuangxu.
Feel free to +1 on this or share your view of why this is important to express your need.
This way we can better track this and assess a priority.

arturn · May 27, 2022, 3:17pm

There is no out-of-the-box super easy way. Especially in a cluster environment.
If you work locally, you can easily modify our training iteration functions to include to save batches your way there.

Topic		Replies	Views
Sample Rule-Based Expert Demonstrations in Rllib RLlib	6	1274	January 24, 2023
Jump-Start Reinforcement Learning RLlib	33	267	February 12, 2025
[RLlib] Make it easier to play trained policies RLlib	2	769	June 3, 2021
Initialize replay buffer RLlib	1	487	July 1, 2021
Using RLlib with jax RLlib	2	1366	December 9, 2022

Implementing Jump Start Reinforcement Learning in RLLib

Related topics