I do not understand why are there these bottlenecks. What are the usual suspects?
I am using BO with ASHA.
I am passing the training data through the config, is that wrong?
I see that the ray results folder is now 3.5 GBs. Is the config stored as part of the experiment state? Is it possible to avoid storing it?
The issue seems to be extremely similar to this issue on the Ray GitHub repository, but it looks like it does not longer apply with the current version of Ray.
Now if I try to add the parameter
global_checkpoint_period=np.inf, I get the exception:
ValueError: global_checkpoint_period is deprecated. Set env var 'TUNE_GLOBAL_CHECKPOINT_S' instead.
|2021-01-13 13:08:00,904|WARN util.py:142 -- The `callbacks.on_trial_result` operation took 5.620 s, which may be a performance bottleneck.| |---|---| |2021-01-13 13:08:00,918|WARN util.py:142 -- The `process_trial` operation took 5.644 s, which may be a performance bottleneck.| |2021-01-13 13:08:03,662|WARN util.py:142 -- The `experiment_checkpoint` operation took 2.743 s, which may be a performance bottleneck.|