Using Ray Tune with Distributed Training Causes Object Spilling

ceruleanwarbler · February 26, 2025, 9:03pm

Hey team, we have a TensorflowTrainer for distributed training that we plug into tune.Tuner and the TensorflowTrainer uses datasets={"train": train_dataset} where train_dataset is a Ray Dataset. When I run concurrent trials with Tune, this results in object spilling. We’ve noticed less object spilling when the model is smaller, which makes it seem like perhaps TF objects are being spilled

Topic		Replies	Views
Object Spilling useful to avoid running out of memory when using Ray Tune Ray Core	13	908	March 4, 2021
XGboost-Ray Object Creation and Spilling bottleneck	5	495	July 8, 2023
[Ray Train] Memory overloading rapidly while training TensorFlow model Ray Train	12	2211	February 24, 2023
[Tune] ray tune with each trial running tf.distribute.experimental.MultiWorkerMirroredStrategy()	1	300	August 19, 2022
Avoid moving datasets around the network when using tune.with_parameters Ray Tune	2	33	July 29, 2024

Using Ray Tune with Distributed Training Causes Object Spilling

Related topics