How to debug performance bottlenecks

I do not understand why are there these bottlenecks. What are the usual suspects?

I am using BO with ASHA.
I am passing the training data through the config, is that wrong?

I see that the ray results folder is now 3.5 GBs. Is the config stored as part of the experiment state? Is it possible to avoid storing it?

The issue seems to be extremely similar to this issue on the Ray GitHub repository, but it looks like it does not longer apply with the current version of Ray.

Now if I try to add the parameter global_checkpoint_period=np.inf, I get the exception:

ValueError: global_checkpoint_period is deprecated. Set env var 'TUNE_GLOBAL_CHECKPOINT_S' instead.

Thanks!

|2021-01-13 13:08:00,904|WARN util.py:142 -- The `callbacks.on_trial_result` operation took 5.620 s, which may be a performance bottleneck.|
|---|---|
|2021-01-13 13:08:00,918|WARN util.py:142 -- The `process_trial` operation took 5.644 s, which may be a performance bottleneck.|
|2021-01-13 13:08:03,662|WARN util.py:142 -- The `experiment_checkpoint` operation took 2.743 s, which may be a performance bottleneck.|

How many trials are you running? Yes, the config is stored in the experiment state. Do you use it to transfer data to the trainables?

If so, it might be good to look into tune.with_parameters instead: Execution (tune.run, tune.Experiment) — Ray v1.2.0.dev0

We’re working on improving error messages for the bottlenecks. The callbacks.on_trial_result usually means that logging a single result takes a long time. If there’s data in the config, that might be the reason.
The process_trial operation includes the callback function, so if one warns, the other one warns, too.

Experiment checkpointing also writes trial configs, so again, if there is much data in them, this might be the reason. Other reasons for slow experiment checkpointing can be a large number of trials. We’re working on resolving the latter problem.

It is currently not possible to not store the trial config in the checkpoints.

Hello @kai and thank you for your answer. At this time I am still trying to understand how to configure how to run everything with the proper configuration so I am using only 10 iterations.

Yes, I am using the config to transfer data to the loss function.

I will look into the method you have proposed, it definitely looks like what I need, right now I am pickling and de-pickling the arguments to avoid the checkpoint issue.

It does the trick! Thanks!

I will mention the solution in the related GitHub issue.

Awesome, glad to hear that!

Now that I have simplified that training process I still get the warning from above, just now it warns for time deltas of around 0.8 seconds. What can I possibly do to further reduce the performance bottleneck?

I see that the slow down is very significant when the Bayesian optimization process moves from the initial random search to the actual BO. I see that there is almost no GPU usage for some reason during the BO itself, after computing the initial random samples.

Since this seems an unrelated problem from the initial topic, I have opened another question.

By the way, we added a section to the FAQ discussing bottlenecks in the documentation here: Tutorials & FAQ — Ray v2.0.0.dev0