Run multiple independent experiments (with slurm)

Medium: my experiments are needlessly repeated many times

I’m using Ray Tune (v 2.1.0 - can’t update to newer version) to run hyperparameter optimisations. I need to do multiple independent experiments with the same scheduler, same search space, same trainable but slightly different config (different architecture).
I launch these experiments on slurm: I start a new job/tune experiment for each config file.
However, it looks like they all get mixed up.
Each of my job logs shows prints of each of the config names, but I expect there to be 1 config per experiment.
It looks like instead of 1 experiment/config per job, ever job runs every experiment/config.

I think it’s an issue with Ray’s scheduling system. I can’t find from the docs how to fix this, the dos only offer instructions on distributed training, but I don’t want to run a single experiment on multiple nodes. I want multiple experiments with each their own node (and completely independent of eachother).

What did I miss? How can I do this?

I have kind of “accepted” that this is not really existing functionality (which is a shame), but I do really want to understand what is happening.

Especially why each of the jobs runs all of the experiments instead of just 1.

Hey @JuliaWasala do you have any examples of your code and the output you are seeing? Pseudocode is fine.

While multi-tenancy isn’t an officially supported use case, there are users who are successfully able to do so today. It sounds like what’s happening here is that there is some metadata getting mixed up, which might happen if you are running the different experiments under the same name.

hey, I figured out what the problem was.
There was an issue with my slurm script - it launched experiments for each of my config files in the same slurm job. (instead of one job per config). That’s how everything ended up in the same file.