Normalizing observations in PPO+LSTM

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am using PPO + built-in auto-LSTM wrapper, my custom environment has a continous action in the range [-5, 5] and 3 observations in the range of around [0,212], [0, 57000], [-5000, 5000] approximately. I wanted to try and normalize it using “observation_filter”: “MeanStdFilter” in the config.

  • What will be the new range of the observations and will I have to change the range in the spaces.Box of the observations(and action) to the same range?
  • Will the action also get normalized, if so will I have to change something before compute_single_action to test my agent?

No, you shouldn’t have to modify your spaces, RLlib will handle that. For compute_single_action, make sure you use algo.compute_single_action and not the policy. method, since the former will automatically handle the filters. See: How to correctly apply observation normalization?