Thanks for the follow-up question! Just for others who run into this question, the main reason I suspect after reading your follow-up is that Ray Tune’s final result usually only considers the last reported result, and not the best seen in the run. This can be specified in calls to experiment analysis methods, and alternatively you can always fetch a full trial result dataframe. Other than that, run should usually be reproducible and we have tests for this in place that this is the case.
Let’s continue the discussion about last/best results in the other thread, and if there are more problems with reproducibility, we can continue here with that.