Hi all, I have a simple question.
I’m trying to understand how the VisionNetwork model works. It is used by default when training with atari GYM envs, like Pong-v0. I have seen that the observations produced by this env have a 210x260x3 shape, that are downscaled or upscaled by the usage of
ray.rllib.models.preprocessors.GenericPixelPreprocessor and they get a
(dim,dim,3) shape. So when inspecting the base keras model created I observe that the inputs layer has a
(84,84,4) shape. So, what does this “extra layer” mean? How are layers sizes defined, because I saw that in the visionnet.py class the inputs layer (“observations”) is created by using a obs_space shape, but I’m not able to find what does this refer to. In addition I will thnak any reference to an explanation of the whole model and how it really receives images and which outputs produces. Thank you so much in advance!