Hi, I’m using a custom environment, I was using stable-baslines3
for training but I’m trying to migrate to RLlib for scaling purposes. However after several training iterations, tuning hyperparams my model never converged to a result nearly as good as in the stabe-baselines3
and I tried to run a episode with a trained file to see what is going on. My agent is always using the maximum actions like:
array([-1., 1., 1., 1., -1.], dtype=float32)
(Actions normalized between -1 and 1)
However my environment does not have a bug because I’m running OK in the stable-baselines3
and I don’t think there is a issue with my RLlib config, therefore I don’t know why my agent is bugging in those extreme actions.