I am running a simple modification of the Ray Tune Quick Start example from the docs Tune: Scalable Hyperparameter Tuning — Ray v1.2.0
(taking a uniform distribution for alpha and making it run indefinitely)
from ray import tune
def objective(step, alpha, beta):
return (0.1 + alpha * step / 100)**(-1) + beta * 0.1
def training_function(config):
# Hyperparameters
alpha, beta = config["alpha"], config["beta"]
for step in range(10):
# Iterative training function - can be any arbitrary training procedure.
intermediate_score = objective(step, alpha, beta)
# Feed the score back back to Tune.
tune.report(mean_loss=intermediate_score)
analysis = tune.run(
training_function, num_samples=-1,
config={
"alpha": tune.uniform(0.001, 0.1),
"beta": tune.choice([1, 2, 3])
})
print("Best config: ", analysis.get_best_config(
metric="mean_loss", mode="min"))
# Get a dataframe for analyzing trial results.
df = analysis.results_df
When I let the tuning run, I notice that the log directory in /tmp/ray/session_latest/logs grows extremely large. In particular, the file gcs_server.out grows to ca. 100 MB in one minute, and contains just a series of the following messages:
[2021-04-30 15:07:58,109 I 16431 16431] gcs_placement_group_manager.cc:292: Registering placement group, placement group id = 8c725267c9a2384dbcd107adc450d63c, name = __tune_a640a6de__6faa9d03, strategy = 0
[2021-04-30 15:07:58,109 I 16431 16431] gcs_placement_group_manager.cc:296: Finished registering placement group, placement group id = 8c725267c9a2384dbcd107adc450d63c, name = __tune_a640a6de__6faa9d03, strategy = 0
[2021-04-30 15:07:58,109 I 16431 16431] gcs_placement_group_scheduler.cc:141: Scheduling placement group __tune_a640a6de__a25b2634, id: 8d3433fc1f7eb1df3fb8cde4757b0e8a, bundles size = 1
[2021-04-30 15:07:58,109 I 16431 16431] gcs_placement_group_scheduler.cc:150: Failed to schedule placement group __tune_a640a6de__a25b2634, id: 8d3433fc1f7eb1df3fb8cde4757b0e8a, because no nodes are available.
[2021-04-30 15:07:58,109 I 16431 16431] gcs_placement_group_manager.cc:215: Failed to create placement group __tune_a640a6de__a25b2634, id: 8d3433fc1f7eb1df3fb8cde4757b0e8a, try again.
When running the tuning for a couple of hours, my disk has no space left due to just this one file.
Is there a way to a) disable the logging or b) fix the problem reported in the logs?
I’m running the tuning using ray version 1.3.0 from pip on Ubuntu 16.04.
Please let me know what other information I should provide to help reproduce the problem.