A possible workaround is to use a ConcurrencyLimiter on your search algorithm, which allows you to set max_concurrent
to M. This limits the number of concurrently running trials, while still allowing you to have N trials in total (PENDING/PAUSED). You can pause/resume trials by controlling their status, but the ConcurrencyLimiter will ensure that no more than M are RUNNING at any time.
For a custom scheduler approach, you can subclass TrialScheduler and override the choose_trial_to_run
method. In your implementation, you can maintain your own logic to select which trials to run (e.g., always pick from PAUSED trials first, or implement your own round-robin or priority logic). This gives you full control over which and how many trials are started, paused, or resumed at any time, independent of the default TuneConfig or resource settings.
Would you like a code example for either approach?
Sources:
Hint: Mention @RunLLM in the post for followups.