Hi, I am using Ray Tune for a project I’m working on and have mostly had great results, but I have a few open questions that I haven’t been able to find any answers to in the documentation. The questions are below - any help is greatly appreciated!
- The training_iteration output shows the same value (1) for almost every trial. Does this indicate that almost all of my initial trials are being done at the same time? I have my max concurrency value set very low, so I’m surprised to see more trials than that running at the same time. Is there a way to limit how many trials run concurrently?
- What setting do I need to enable to save model checkpoints? I’d like to save the best performing model at the end of a Tune run and potentially use it for predictions. I set the CheckpointConfig option up, but none of my runs result in any model checkpoints.
- Is there a way to control the early stopping of trials? I’d like to experiment with letting trials run longer to see how that impacts results.