Ray Iteration vs Keras Epoch

What is the difference between a ray iteration and an epoch in keras? It appears to me that if I specify a keras trainable object and set the number of epochs to N, the number of iterations within a given trial will always amount to N+1. Why is that so? I would have thought that the number of iterations should be equal to the number of epochs.

Ray’s training_iteration is incremented on every step for Class Trainables and on every call to session.report for Function Trainables. See here for more details: Training in Tune (tune.Trainable, session.report) — Ray 2.3.0

Setting num epochs in a keras model.fit(epochs=...) with the Ray AIR Keras callback should have the training iteration match the epoch (using default settings). Could you provide your training script?

Thanks for your response. First, I had a custom callback function that seemed to be the problem. I now used TuneReportCallback from ray.tune.keras, as given in the Keras tuning example in the documentation. I do not find any proper documentation outside of the example of this class though. You are pointing me to another class that seems to do the same thing (?), i.e. ray.air.integrations.keras.ReportCheckpointCallback. For this, there is proper documentation. I am a bit confused here, do they differ?

In general, I am a bit confused about the ray documentation (only talking about hyperparameter tuning). Some of the stuff pops up in both the tune library documentation and the AIR library documentation. Some of the stuff only pops up in one of the respective library documentations. Some of the functions used in the examples are not part of the documentation (as shown above).

To add to my point, the class you were pointing me to in the documentation, i.e. ReportCheckpointCallback, does not exist. After checking the directory, I suppose its ray.air.integrations.keras.KerasCallback, right?

Hey @F_S,

My mistake! The class I linked got renamed on master but is still the ray.air.integrations.keras.KerasCallback in the latest Ray version as you point out.

I agree – our integrations should be centralized in one place. The difference between the tune.integration Callback and the air.integrations Callback is that the air one will work both in a regular Tune training function as well as a Ray Train Trainer doing distributed TF training. The functionality is the same though – so we recommend using the AIR integration.

We have plans to deprecate the tune callback in the near future.