How do the bounds on an observation space affect convergence?

Hey, I’m an RL beginner and have a general RL question.

Suppose I have an N dimensional continuous observation space created using a gym.spaces.Box object. I can specify the lows and the highs for each element. Let’s say that in practice, every element in the observation space will range from -5 to 5.

How does it affect convergence if I create the Box with the correct bounds, i.e. [-5, 5], compared to if I created it with a much wider range, say [-1000, 1000]? If it does matter, what is the theory/reasoning?


Observations should be normalized anyway (they are automatically mean/std filtered in RLlib.

But generally speaking, parameters and their gradients in neural networks scale with inputs and outputs. Meaning that a large input will simply result in overall smaller weights when optimizing for the same objective. There are a lot of things that play into this dynamic though. For example gradients being clipped or the chosen floating point precision. You only have to deal with this if you don’t normalize your observations though, which by default, you don’t.