Incorrect steps calculation in GPT-J fine-tuning example

amogkam · July 14, 2023, 3:30pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hi, I’m running the GPT-J fine-tuning example: GPT-J-6B Fine-Tuning with Ray AIR and DeepSpeed — Ray 2.5.1.

But I’m confused about the math behind the steps per epoch that is calculated.

The example mentions the following:

The preprocessed dataset has 1348 examples. We have set per device batch size to 16.

With 16 g4dn.4xlarge nodes, the effective batch size was 256, which equals to 85 steps per epoch. One epoch took ~2440 seconds (including initialization time).

With 32 g4dn.4xlarge nodes, the effective batch size was 512, which equals to 43 steps per epoch. One epoch took ~1280 seconds (including initialization time).

But if batch size is 256 and the dataset size is 1348, shouldn’t the number of steps per epoch be 1348/256 ~= 6 steps/epoch, not 43?

@Yard1

Yard1 · July 17, 2023, 5:19pm

Yes, you are right. PR would be appreciated

amogkam · July 17, 2023, 5:39pm

Thanks. To clarify, which part are you saying is incorrect? Is the dataset size that is mentioned not correct, or the steps/epoch? From the actual training output, it looks like 43 steps are being run.

Yard1 · July 17, 2023, 5:43pm

I believe the dataset size is not correct

Topic		Replies	Views
Information on steps_per_epoch in distributed tensorflow Ray Tune	4	656	April 15, 2021
Is it correct for this sample code? Ray Train	1	328	September 25, 2023
Performance issue of back-propagation in using RaySGD Ray Tune	3	361	July 30, 2021
Ray Iteration vs Keras Epoch	4	725	February 24, 2023
GPT-J6B Sample Code	0	122	April 10, 2024

Incorrect steps calculation in GPT-J fine-tuning example

Related topics