I solved this issue using as a trainable just a PPOTrainer or a DQNTrainer instead of the experiment function
PPOTrainer
DQNTrainer
experiment