LSTM and Attention on Stateless CartPole

I have been experimenting with the partially observable StatelessCartPole and different options on how to deal with it, very similar to what @sven1977 did in this Anyscale blog post.

Currently, I am struggling to reproduce similar results; PPO does not seem to learn on StatelessCartPole - neither with LSTM nor with attention: Dealing with Partial Observability In Reinforcement Learning | Stefan’s Blog

Instead, simply stacking the last 4 observations (without LSTM and attention) leads to good results.
Is it correct that enabling "lstm": True or "attention": True does not automatically enable frame stacking, i.e., the agent still only has one partial observation, not a sequence of observations?

Strangely, if I pass the environment with stacked observations (using the FrameStack wrapper) to an agent with LSTM or attention enabled leads to much worse results than with LSTM and attention disabled.
I only set "lstm": True or "attention": True and otherwise kept the model defaults; is there something else I must configure for LSTMs or attention to work?

Also, frame stacking in the environment (with the FrameStack wrapper) works really well and much better (roughly 2x higher reward!) than taking the stacked observations inside the model using the trajectory API.
I expected both to roughly lead to the same results. Am I missing something? Even though, I’m just running a single experiment here, it does seem like this behavior is reproducible.

All my attempts are shown here (it’s just a Jupyter notebook, so should be reproducible): Dealing with Partial Observability In Reinforcement Learning | Stefan’s Blog

Hi @stefanbschneider,

What version of ray are you using?

My guess is that you are seeing the effects of these bugs causing training issues:

There might be others we have not found yet. =*(

1 Like

@mannyv Thanks, I’ll keep following these issues and look for other related bugs. I’m using ray 1.8.0

Since both issues are closed and fixed now, I tested again and looked into this further.
Unfortunately, attention still does not seem to work well for me; same for frame stacking inside the model.

I opened an issue with reproduction script here: [Bug] [rllib] Attention and FrameStackingModel work poorly · Issue #20827 · ray-project/ray · GitHub

It’s also very much possible that I’m overlooking something.

@stefanbschneider
Hi, I am reproduce the program on Partial Observability in your blog, I hope to change the discrete action in the trajectory_view into continuous action, after modifying the model, but there is an error. I have provided a code that can be reproduced. If you have time, please help me to check where I made mistakes, thank you.
trajectory_view with continuous action

@robot-xyh Unfortunately, I haven’t had time to look into this again and, just looking at your code/comments, I also don’t know what causes your error.

Just so you know, Sven commented on and resolved my issues described here: [Bug] [rllib] Attention and FrameStackingModel work poorly · Issue #20827 · ray-project/ray · GitHub
Maybe that’s useful for you too.