LSTM and Attention on Stateless CartPole

I have been experimenting with the partially observable StatelessCartPole and different options on how to deal with it, very similar to what @sven1977 did in this Anyscale blog post.

Currently, I am struggling to reproduce similar results; PPO does not seem to learn on StatelessCartPole - neither with LSTM nor with attention: Dealing with Partial Observability In Reinforcement Learning | Stefan’s Blog

Instead, simply stacking the last 4 observations (without LSTM and attention) leads to good results.
Is it correct that enabling "lstm": True or "attention": True does not automatically enable frame stacking, i.e., the agent still only has one partial observation, not a sequence of observations?

Strangely, if I pass the environment with stacked observations (using the FrameStack wrapper) to an agent with LSTM or attention enabled leads to much worse results than with LSTM and attention disabled.
I only set "lstm": True or "attention": True and otherwise kept the model defaults; is there something else I must configure for LSTMs or attention to work?

Also, frame stacking in the environment (with the FrameStack wrapper) works really well and much better (roughly 2x higher reward!) than taking the stacked observations inside the model using the trajectory API.
I expected both to roughly lead to the same results. Am I missing something? Even though, I’m just running a single experiment here, it does seem like this behavior is reproducible.

All my attempts are shown here (it’s just a Jupyter notebook, so should be reproducible): Dealing with Partial Observability In Reinforcement Learning | Stefan’s Blog

Hi @stefanbschneider,

What version of ray are you using?

My guess is that you are seeing the effects of these bugs causing training issues:

There might be others we have not found yet. =*(

1 Like

@mannyv Thanks, I’ll keep following these issues and look for other related bugs. I’m using ray 1.8.0

Since both issues are closed and fixed now, I tested again and looked into this further.
Unfortunately, attention still does not seem to work well for me; same for frame stacking inside the model.

I opened an issue with reproduction script here: [Bug] [rllib] Attention and FrameStackingModel work poorly · Issue #20827 · ray-project/ray · GitHub

It’s also very much possible that I’m overlooking something.