Training time not change linearly when changing sample/batch size

Ziqi_Jiang · February 6, 2024, 6:14pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity
Low: It annoys or frustrates me for a moment.
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.

I use ray.data.read_parquet to load training data (mostly tabular data, a couple of list features, E.g. x1 for 1 sample is [1,2,3,4,5]), and TorchTrainer for training on 1 worker with 1 GPU + 8 CPUs. Serveral observation I have is that:

when I double my batch size from 4096 to 8192, the training time doesn’t change while I expected it to be roughly halved.
when I use ray.data.read_parquet(filenames).random_sample(0.1), the training time doesn’t change while I expected it to be roughly 1/10.
Is there example or guidance I can look into to understand why and how to improve?

Topic		Replies	Views
Training time not decreasing with more workers Ray Train	2	25	March 19, 2025
Ray Data streaming not streaming smoothly Ray Data	8	756	May 30, 2023
Increase in workers doesn't decrease training time Ray Train	9	1182	June 8, 2022
Long initialization time to initialize_session with large scale dataset Ray Train	3	62	January 29, 2025
Model output when trained multiple times Ray Train	11	538	March 22, 2023

Training time not change linearly when changing sample/batch size

Related topics