Migrating from StableBaselines3, not able to reproduce results

I’ve been trying to migrate my PPO training from StabeBaselines3 to Ray Rllib, but I am having difficulties reproducing my prior results. I’m training the same environment and tried setting the same hyperparameters, but maybe I am missing something? Has anyone else been able to reproduce results between libraries? Below are the params I am setting, am I missing any other default params that differ between sb3 and rllib? My main issue seems to be that vf_loss and entropy doesn’t decrease.

StableBaselines3

policy_kwargs = dict(net_arch=[128, 128, 128])
n_steps = 512
learning_rate = 0.0001
batch_size = 64
gamma = 0.99
ent_coef = 0.05
clip_range = 0.2
n_envs = 4

RLLib

“num_rollout_workers”: 0, # force to run on local worker for my special callbacks to work
“num_envs_per_worker”: 4,
“framework”: “torch”,
“gamma”: 0.99,
“lambda”: 0.95, # From sb3 docs, this should be 0.95 to match?
“lr”: 0.0001,
“train_batch_size”: 512*4, # not quite sure here, I do n_steps * n_envs to try and match sb3
“sgd_minibatch_size”: 64,
“num_sgd_iter”: 10, # From sb3 docs, this should be 10 to match?
“vf_loss_coeff”: 0.5, # From sb3 docs, this should be 0.5 to match?
“vf_clip_param”: np.inf, # From sb3 docs, they don’t have a limit?
“normalize_advantage”: True, # sb3 normalizes by default
“clip_param”: 0.2,
“entropy_coeff”: 0.05,
“model”: {
“fcnet_hiddens”: [128, 128, 128],
}

Solved the issue with vf_loss! I had mistakenly during the refactor set terminated=True in my environment after when it reaches my set amount of steps, instead of truncated=True. Apparently it caused the value function to completely collapse, both in SB3 and RLlib. Had no idea there would be a difference when using terminated or truncated.