Ray Iteration vs Keras Epoch

F_S · February 22, 2023, 11:36am

What is the difference between a ray iteration and an epoch in keras? It appears to me that if I specify a keras trainable object and set the number of epochs to N, the number of iterations within a given trial will always amount to N+1. Why is that so? I would have thought that the number of iterations should be equal to the number of epochs.

justinvyu · February 22, 2023, 6:59pm

Ray’s training_iteration is incremented on every step for Class Trainables and on every call to session.report for Function Trainables. See here for more details: Training in Tune (tune.Trainable, session.report) — Ray 2.3.0

Setting num epochs in a keras model.fit(epochs=...) with the Ray AIR Keras callback should have the training iteration match the epoch (using default settings). Could you provide your training script?

F_S · February 23, 2023, 8:28am

Thanks for your response. First, I had a custom callback function that seemed to be the problem. I now used TuneReportCallback from ray.tune.keras, as given in the Keras tuning example in the documentation. I do not find any proper documentation outside of the example of this class though. You are pointing me to another class that seems to do the same thing (?), i.e. ray.air.integrations.keras.ReportCheckpointCallback. For this, there is proper documentation. I am a bit confused here, do they differ?

In general, I am a bit confused about the ray documentation (only talking about hyperparameter tuning). Some of the stuff pops up in both the tune library documentation and the AIR library documentation. Some of the stuff only pops up in one of the respective library documentations. Some of the functions used in the examples are not part of the documentation (as shown above).

F_S · February 23, 2023, 8:38am

To add to my point, the class you were pointing me to in the documentation, i.e. ReportCheckpointCallback, does not exist. After checking the directory, I suppose its ray.air.integrations.keras.KerasCallback, right?

justinvyu · February 24, 2023, 11:26pm

Hey @F_S,

My mistake! The class I linked got renamed on master but is still the ray.air.integrations.keras.KerasCallback in the latest Ray version as you point out.

I agree – our integrations should be centralized in one place. The difference between the tune.integration Callback and the air.integrations Callback is that the air one will work both in a regular Tune training function as well as a Ray Train Trainer doing distributed TF training. The functionality is the same though – so we recommend using the AIR integration.

We have plans to deprecate the tune callback in the near future.

Topic		Replies	Views
Relationship of epochs and training itertions	0	104	April 17, 2024
Ray tune iter vs Autogluon epochs Ray Tune	0	249	April 3, 2024
Cleanest integration of TensorFlow `model.fit()` in the Ray Tune Class API `step` method Ray Tune	4	538	September 23, 2022
Concept of trial and iteration Ray Tune	4	1433	March 14, 2022
Tuning a Keras model - no checkpoints saved Ray Tune	7	1508	March 1, 2023

Ray Iteration vs Keras Epoch

Related topics