Incorrect steps calculation in GPT-J fine-tuning example

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hi, I’m running the GPT-J fine-tuning example: GPT-J-6B Fine-Tuning with Ray AIR and DeepSpeed — Ray 2.5.1.

But I’m confused about the math behind the steps per epoch that is calculated.

The example mentions the following:

The preprocessed dataset has 1348 examples. We have set per device batch size to 16.

With 16 g4dn.4xlarge nodes, the effective batch size was 256, which equals to 85 steps per epoch. One epoch took ~2440 seconds (including initialization time).

With 32 g4dn.4xlarge nodes, the effective batch size was 512, which equals to 43 steps per epoch. One epoch took ~1280 seconds (including initialization time).

But if batch size is 256 and the dataset size is 1348, shouldn’t the number of steps per epoch be 1348/256 ~= 6 steps/epoch, not 43?

@Yard1

Yes, you are right. PR would be appreciated :slight_smile:

Thanks. To clarify, which part are you saying is incorrect? Is the dataset size that is mentioned not correct, or the steps/epoch? From the actual training output, it looks like 43 steps are being run.

I believe the dataset size is not correct