Using ray tune to optimise a function called with subprocess

ale152 · March 15, 2024, 9:20am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi everyone! I’m trying to do a hyperparameter search for a very complex training script that I’ve got. Since I’m not allowed to modify the training script, I’m trying to work around it by using subprocess to call it in my objective function and reading the performance outputs from our existing reporting framework.

My current simplified script for hyperparameter search looks like this:

def objective_function(config):
    args = ["python", "script.py"]

    # Add hyperparameters
    for arg_name, arg_value in config.items():
        args.append(f"--{arg_name}")
        args.append(str(arg_value))

    # Run the training script
    process = subprocess.Popen(
        args, cwd=training_path
    )

    while True:
        metric = read_from_report()
        train.report({"score": metric})
        time.sleep(1)


search_space = {
        "learning-rate": tune.loguniform(1e-5, 1e-1),
        "batch-size": tune.choice([16, 32, 64, 128]),
        "optimizer": tune.choice(['sgd', 'adam']),
    }

asha_scheduler = ASHAScheduler(
    time_attr='training_iteration',
    max_t=100,
)

tune_config = TuneConfig(
    max_concurrent_trials=1,
    num_samples=-1,
    search_alg=HyperOptSearch(),
    mode='max',
    scheduler=asha_scheduler,
)

# Define the hyperparameter search algorithm
trainable = objective_function

trainable_with_gpu = tune.with_resources(trainable, {"gpu": 1, "cpu": 20})

tuner = tune.Tuner(
    trainable_with_gpu, param_space=search_space, tune_config=tune_config
)
results = tuner.fit()
print(results)

The problem that I have has to do with resources and parallelism. My training script is already highly optimized for parallelism and automatically uses all the workers and gpus available. However, when I run ray.Tuner without the tune.with_resources wrapper, the script doesn’t seem to use my gpu. Is this because it runs in some sort of virtual environment? If I specify "gpu": 1 it correctly identifies my gpu and uses it, but it seems to use 0/20 cpus and runs really slowly (not sure what 0 means in this case!), so I have to manually set to "cpu": 20 or whichever number of cpus I have at the moment.

The other problem that I experienced is that my training script seems to randomly haning in the middle of nowhere and for no apparent reason. Initially, I was using subprocess.Popen with stdout=PIPE and I was reading the output with .read, but I realised that this may cause some overflow if the script produces too much text and results in hanging, so I left the script to output everything on console.

Is using subprocess the best approach for optimizing an existing function without having to do any change to it?

Thank you for your help and for the wonderful tool

Topic		Replies	Views
How does early termination and trial quality evaluation work? Ray Tune	8	1255	May 16, 2024
Evaluate a model after hyperparameters research algorithm Ray Tune	5	367	January 11, 2023
Running Tune with nonparallel function Ray Tune	3	298	May 21, 2021
Hyperopt with Ray Tune vs using Hyperopt directly Ray Tune	1	658	February 18, 2021
Using ray tune as offline hyperparameter suggester Ray Libraries (Data, Train, Tune, Serve)	0	247	September 17, 2022

Using ray tune to optimise a function called with subprocess

Related Topics