How to pass nested hyperparam_bounds to pb2

Hi,

I’m trying to tune a SAC agent using the pb2 scheduler. As I’m somewhat new to SAC I assumed that trying to tune the sac.DEFAULT_CONFIG[“optimization”] would be a good place to start which in turn holds three nested keys “actor_learning_rate”, “critic_learning_rate” and “entropy_learning_rate”. My code is generally inspired by your own example and looks like this:

pb2 = PB2(
        time_attr=args.criteria,
        metric="episode_reward_mean",
        mode="max",
        perturbation_interval=args.t_ready,
        quantile_fraction=args.perturb,  # copy bottom % with top %
        # Specifies the hyperparam search space
        hyperparam_bounds={
            "optimization": {
                "actor_learning_rate": [1e-4,5e-4],
                "critic_learning_rate": [1e-4,5e-4],
                "entropy_learning_rate": [1e-4,5e-4],
                },
        },
        log_config=True
    )

Any quick fixes or workarounds?

BR

Jorgen

2 Likes

Forgot to include the error:

ValueError: hyperparam_bounds values must either be a list or tuple of size 2, but got {‘actor_learning_rate’: [0.0001, 0.0005], ‘critic_learning_rate’: [0.0001, 0.0005], ‘entropy_learning_rate’: [0.0001, 0.0005]} instead

1 Like

Hi @Jorgen_Svane ,
Thanks for posting this question. Indeed this is not supported.
However, supporting it should not a big change. If I am reading code correctly, PB2 calls into PBT, which actually accepts values being dict.
Could you try removing the validation here and see if it works?
If so then we can relax the validation logic to properly include your case.

1 Like

Hi Thanks,

I’ll give it a try. However, it appears to cause problems already from line 358 and in the subsequent dataframes as they will have to be multiindexed. But I do see your point and it may be possible to rewrite from there and onward in a custom pb2 class that “flattens” the nested dict and then “re-nest” just before returning the new_config in line 403. I also considered to make a custom SACTrainer which remove the nesting of parameters.

Right now I’m running a PB2 tuning of a PPO agent but will give it a go afterwards and hopefully post a solution.

1 Like

thank you
Didnt work for me …

hi @xwjiang2010
I would appreciate any help regrding nested hyperparams.
Thanks

Hi Nissim.

May I ask what didn’t work for you? Are you trying to tune a SAC agent too? Did you try my proposal on “de-nesting => re-nesting” as suggested above? Obviously, just removing the nested part will fail. I’m not on this issue right now but expect to get back to it soon so I’m interested in your findings. Another point of attack could perhaps be to customize the trainer algo itself and de-nest the relevant hyperparameters here. In Ray 1.13 you should already customize in order to reuse actors and speed up the process so it may be a way forward

My understanding is that Ray is undergoing a major revision so hopefully this will be aligned in future versions.

BR

Jorgen

Hi @xwjiang2010

I came across issue #29102 “Fix broken PB2._get_new_config method override” dated Oct 7. If I understand it correct this also fixes the nested dict issue? I also appears to be merged into the master (Ray 2) now, so a fresh install should include it? If so, I’ll be happy to test it and perhaps post a simple example.

On a site note there seems to be a problem with the GPy package as it fails to import Cython. It appears to be a long lasting issue not being addressed - see this one:

I appreciate it’s not your problem as it fails independent of installing Ray. However, there is a similar package from PyTorch named GPyTorch, which utilizes GPU acceleration. Hence, it may be worth considering for the future.

BR

Jorgen