Tune terminates after running a single configuration

fonnesbeck · September 2, 2021, 5:26pm

I have a PyTorch tabular model that I have set up to tune using the following:


    config = {
        "num_layers": tune.choice([1, 2, 3]),
        "num_trees": tune.choice([512, 768, 1024]),
        "depth": tune.choice([2, 4, 6]),
        "batch_size": tune.choice([128, 512, 1024])
    }

    def train_tabular(config, df_train, df_test):
        model = build_model(num_trees=config['num_trees'], depth=config['depth'], num_layers=config['num_layers'], batch_size=config['batch_size'], use_embedding=True, epochs=10)

        model.fit(train=df_train, validation=df_test)
        eval = model.evaluate(df_test)
        tune.report(mse=eval[0]['test_mean_squared_error'])


    analysis = tune.run(
        tune.with_parameters(train_tabular, df_train=df_train, df_test=df_val), 
        resources_per_trial={'gpu': 1},
        mode="min",
        config=config)

However, when this runs it terminates after the first configuration:

Trial name	status	loc	batch_size	depth	num_layers	num_trees	iter	total time (s)	mse
train_tabular_623df_00000	TERMINATED		128	6	2	1024	1	4951.7	0.0099718


Output was trimmed for performance reasons.
To see the full output set the setting "jupyter.textOutputLimit" to 0.
...
2021-09-02 01:55:58,998	INFO tune.py:561 -- Total run time: 4954.25 seconds (4954.01 seconds for the tuning loop).
Best config:  {'num_layers': 2, 'num_trees': 1024, 'depth': 6, 'batch_size': 128}

Does anything see something obviously wrong with what I have set up here? There is no error message, it just stops running after the first model is run.

matthewdeng · September 2, 2021, 5:37pm

Hey @fonnesbeck,

Can you try setting num_samples?

For more extensive configuration, there are some resources available in the User Guide.

mannyv · September 3, 2021, 3:04am

@fonnesbeck,

Unfortunately Ray is doing exactly as you instructed it to. tune.choice means draw a random sample from a list of possibilities. Given your configuration is going to draw one value for each of those config params, run 1 experiment and then stop.

As @matthewdeng noted you could include a num_samples > 1 which would replicate this setup n times. On each replication it would resample each of the config values.

Another approach would be to convert those tune.choice to tune.grid_search then it would enumerate through all combinations. In this case that would be 3**4 (81).

If that is too many then you could keep the config you have now but switch to a more sophisticated tune search algorithm.

https://docs.ray.io/en/latest/tune/key-concepts.html#search-algorithms

Topic		Replies	Views
Tuning fails with "The actor ImplicitFunc is too large" Ray Tune	2	1233	September 1, 2021
Tuner.fit() never terminates Ray Tune	4	373	January 23, 2025
Questions about tune stopping condition with PBT	1	435	February 27, 2023
Tune & Pytorch Lightning: trials do not terminate, others do Ray Tune	1	415	December 5, 2022
Question - About tune stopping condition with PBT	6	502	February 21, 2023

Tune terminates after running a single configuration

Related topics