Approaching a POMDP problem with RLlib

ihopethiswillfi · February 25, 2023, 11:36am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Hi, the problem I’m trying to solve with RLlib is a POMDP and for me to be able to succeed I require good generalization. I’ve been reading various articles and papers on the best practices here. I’m quite new at RL.

I found a few interesting articles / papers:

Paper:

Article:

This one proposes the following approach which seems to outperform baselines of several problems:

Separating the RNNs in actor and critic networks. Un-sharing the weights can prevent gradient explosion, and can be the difference between the algorithm learning nothing and solving the task almost perfectly.
Using an off-policy RL algorithm to improve sample efficiency. Using, say, TD3 instead of PPO greatly improves sample efficiency.
Tuning the RNN context length. We found that the RNN architectures (LSTM and GRU) do not matter much, but the RNN context length (the length of the sequence fed into the RL algorithm), is crucial and depends on the task. We suggest choosing a medium length as a start.

Two and Three are easy to do in Ray. UPDATE: not so easy, I see TD3 and SAC don’t support LSTM auto-wrap. Would this be trivial to implement myself or is there a good reason that it’s currently missing from Ray (i.e. it’s hard to do).

But how would one go about 1. Separating the RNNs in actor and critic networks? Is this something you can do in RLlib?

There is also this one article which introduces an algorithm ‘LEEP’ for this usecase, which seems interesting, but it’s unfortunately not available in RLlib.

arturn · April 13, 2023, 11:19pm

There is a dictionary ray.rllib.models.catalog.MODEL_DEFAULTS that carries all the values that you’d be able to modify without touching the models themselves.
This dict has the option vf_share_layers.
That generally makes it so that actor and critic have different networks.
Be ware though that not all algorithms make use of this.

If you are interested in finding the best settings for some of the parameters you talked about, such as the context length, I suggest you use Ray Tune for that.

Topic		Replies	Views
RNN support + RAM usage for RL algorithms RLlib	2	215	January 17, 2023
Custom LSTM model doesn't perform well RLlib	3	570	January 13, 2023
Not Sure Which RLlib Algorithm To Use RLlib	5	640	April 27, 2021
Custom LSTM Model, how to define the SEQ_LEN RLlib	5	2465	June 10, 2024
Implementing Jump Start Reinforcement Learning in RLLib RLlib	8	1145	May 27, 2022

Approaching a POMDP problem with RLlib

Related topics