How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am using PPO + built-in auto-LSTM wrapper, my custom environment has a continous action in the range [-5, 5] and 3 observations in the range of around [0,212], [0, 57000], [-5000, 5000] approximately. I wanted to try and normalize it using “observation_filter”: “MeanStdFilter” in the config.
- What will be the new range of the observations and will I have to change the range in the spaces.Box of the observations(and action) to the same range?
- Will the action also get normalized, if so will I have to change something before compute_single_action to test my agent?