I’m running 6 trials with DQN using Tune’s PBT. Among those 6, two stopped training after 600 iterations, another two also stopped training after 800 iterations. The remaining two are currently training.
However, there’s no error regarding those stopped trials. Moreover, all of these “stopped” trials are logged as “RUNNING” as you can see below.
Below is the “episode_reward_max” graph. As you can see, those “stopped” trials were having awful rewards when they were stopped. I’m wondering if this something PBT automatically does: killing trials with low reward values. If so, where can I find the API regarding this? I can’t find anything about early kill from the Tune documentation.
By the way, the stopping standard is "time_total_s": 72_000
, which none of the trials is even close to reaching.