How to pass nested hyperparam_bounds to pb2


I’m trying to tune a SAC agent using the pb2 scheduler. As I’m somewhat new to SAC I assumed that trying to tune the sac.DEFAULT_CONFIG[“optimization”] would be a good place to start which in turn holds three nested keys “actor_learning_rate”, “critic_learning_rate” and “entropy_learning_rate”. My code is generally inspired by your own example and looks like this:

pb2 = PB2(
quantile_fraction=args.perturb, # copy bottom % with top %
# Specifies the hyperparam search space
“optimization”: {
“actor_learning_rate”: [1e-4,5e-4],
“critic_learning_rate”: [1e-4,5e-4],
“entropy_learning_rate”: [1e-4,5e-4],

Any quick fixes or workarounds?



1 Like

Forgot to include the error:

ValueError: hyperparam_bounds values must either be a list or tuple of size 2, but got {‘actor_learning_rate’: [0.0001, 0.0005], ‘critic_learning_rate’: [0.0001, 0.0005], ‘entropy_learning_rate’: [0.0001, 0.0005]} instead

1 Like

Hi @Jorgen_Svane ,
Thanks for posting this question. Indeed this is not supported.
However, supporting it should not a big change. If I am reading code correctly, PB2 calls into PBT, which actually accepts values being dict.
Could you try removing the validation here and see if it works?
If so then we can relax the validation logic to properly include your case.

Hi Thanks,

I’ll give it a try. However, it appears to cause problems already from line 358 and in the subsequent dataframes as they will have to be multiindexed. But I do see your point and it may be possible to rewrite from there and onward in a custom pb2 class that “flattens” the nested dict and then “re-nest” just before returning the new_config in line 403. I also considered to make a custom SACTrainer which remove the nesting of parameters.

Right now I’m running a PB2 tuning of a PPO agent but will give it a go afterwards and hopefully post a solution.

1 Like