Hi @thomasbbrunner,
The issue is in the value_function. You need to squeeze out the trailing singleton dimension. You can fix it by following the example from here: ray/fcnet.py at 3f89f35e5269c8a9391fb98a535cde7ffd6bcd9d · ray-project/ray · GitHub
def value_function(self):
return tf.reshape(self.critic_out, [-1])