Optuna/Tune Hyperparameter Search for Lunar Lander Continuous Not Working

Hey,

I’ve been trying to use tune for RL problems for the past 6 weeks, and after all sorts of things continually not working I tried using PPO/lunar lander continuous for a proof of concept. That still isn’t working and I have absolutely no idea why, so I was hoping someone could tell me what I’m doing wrong here:

My code (it’s short):

Parameter/Reward outputs (nothing gets above 0 reward, solved is 200):

Can you first try setting max_concurrent=1 to debug and see if it’s actually improving a relevant metric?

I started a run as soon as you mentioned that, and since then it’s gone through 16 sequential updates (it’s RL so it’s slow). The expected reward of the policy has also not improved in that time.

Can you post the stdout and also relevant ray tune code snippets (tune.report, tune.run)?

Here’s the ray snippets and std out for max_concurrent=10 from the original post:

And here’s my current stdout for max_concurrent=1:

@justinkterry can you post the longer max_concurrent stdout?

Could you show maybe the full 100 samples?

Here you go, sorry:

Hmm ok. I think the hyperparameter space that you’re working with is way too large. Could you try narrowing this to say 2 or 3 hyperparameters and trying again?

I tried tuning just 3 hyperparameters, then just 1 hyperparameter, neither showed any improvement

3 hyperparameter code: from stable_baselines3 import PPOfrom stable_baselines3.common.callbacks impor - Pastebin.com
stdout: I accidentally lost this file but literally nothing happened. It actually got slightly worse.
1 hyperparam code:from stable_baselines3 import PPOfrom stable_baselines3.common.callbacks impor - Pastebin.com
stdout: 1

The hyperparameter search space I was originally working with is also not way too large. Other projects using straight optuna for this exact environment and similar search spaces have had it work just fine:

Either you, I, and a bunch of my colleagues who have looked at my script are missing something or there’s some sort of weird bug in Tune’s support for Optuna.