Run multiple independent experiments (with slurm)

JuliaWasala · December 7, 2023, 11:19am

Medium: my experiments are needlessly repeated many times

I’m using Ray Tune (v 2.1.0 - can’t update to newer version) to run hyperparameter optimisations. I need to do multiple independent experiments with the same scheduler, same search space, same trainable but slightly different config (different architecture).
I launch these experiments on slurm: I start a new job/tune experiment for each config file.
However, it looks like they all get mixed up.
Each of my job logs shows prints of each of the config names, but I expect there to be 1 config per experiment.
It looks like instead of 1 experiment/config per job, ever job runs every experiment/config.

I think it’s an issue with Ray’s scheduling system. I can’t find from the docs how to fix this, the dos only offer instructions on distributed training, but I don’t want to run a single experiment on multiple nodes. I want multiple experiments with each their own node (and completely independent of eachother).

What did I miss? How can I do this?

JuliaWasala · December 7, 2023, 1:00pm

I have kind of “accepted” that this is not really existing functionality (which is a shame), but I do really want to understand what is happening.

Especially why each of the jobs runs all of the experiments instead of just 1.

matthewdeng · December 13, 2023, 7:49pm

Hey @JuliaWasala do you have any examples of your code and the output you are seeing? Pseudocode is fine.

While multi-tenancy isn’t an officially supported use case, there are users who are successfully able to do so today. It sounds like what’s happening here is that there is some metadata getting mixed up, which might happen if you are running the different experiments under the same name.

JuliaWasala · December 19, 2023, 9:34am

hey, I figured out what the problem was.
There was an issue with my slurm script - it launched experiments for each of my config files in the same slurm job. (instead of one job per config). That’s how everything ended up in the same file.

Topic		Replies	Views
Ray tune Multi-tenancy Ray Tune	2	344	October 5, 2023
Parallelly running experiments with Ray Tune on a single Machine Ray Tune	8	103	March 6, 2025
[tune] multiple runs with same hyperparameter, different random seed Ray Tune	4	2046	January 29, 2021
Ray exec multiple scripts w/ tune.run() to same ray cluster Ray Tune	18	1450	February 14, 2021
Can 'tune.run' just run a function on multiple GPUs with different configs without "trials"? Ray Tune	1	439	January 22, 2021

Run multiple independent experiments (with slurm)

Related topics