[PBT] Population-based Training early kill?

I’m running 6 trials with DQN using Tune’s PBT. Among those 6, two stopped training after 600 iterations, another two also stopped training after 800 iterations. The remaining two are currently training.
However, there’s no error regarding those stopped trials. Moreover, all of these “stopped” trials are logged as “RUNNING” as you can see below.

Below is the “episode_reward_max” graph. As you can see, those “stopped” trials were having awful rewards when they were stopped. I’m wondering if this something PBT automatically does: killing trials with low reward values. If so, where can I find the API regarding this? I can’t find anything about early kill from the Tune documentation.
By the way, the stopping standard is "time_total_s": 72_000, which none of the trials is even close to reaching.

cc @amogkam – any ideas about this?

I think actually this may just be an issue with Ray scheduling. Can you create a reproducible script so that I can try running this myself?

Posting this script on Github (for the Ray project issue page) would be most preferred!