How to use custom model with IMPALA trainer (first time using rllib)

Hi @thomasbbrunner,

The issue is in the value_function. You need to squeeze out the trailing singleton dimension. You can fix it by following the example from here: ray/fcnet.py at 3f89f35e5269c8a9391fb98a535cde7ffd6bcd9d · ray-project/ray · GitHub

    def value_function(self):
        return tf.reshape(self.critic_out, [-1])