Best Practices for Optimizing Ray Tune Trials

Hi everyone… :wave:

I’m currently using Ray Tune to optimize hyperparameters for a machine learning model, and I’m running into performance issues when scaling up the number of trials. I’d like to know:

  1. What’s the best way to balance resource allocation across trials?
  2. Are there specific schedulers or search algorithms that perform better for large-scale runs?

I check this: https://discuss.ray.io/t/best-practices-to-run-multiple-models-in-multiple-gpus-in-rayll and DevOps course online But I have not found any solution. Could anyone guide me about this? I’m working on a cluster setup with limited CPU/GPU resources, so I’m trying to avoid unnecessary overhead. Any tips or real-world examples would be greatly appreciated!

Thanks in advance.

Respected community member! :blush: