Ray 2.9.3: map_batches and multi-gpu -- not processing partition blocks / evenly sharding

localh · February 28, 2024, 2:10pm

I am running multi-GPU inference with map_batches but having difficulty understanding why the operations are not processing data evenly or accounting for the block completion as inference proceeds. I have played with varying partition sizes and I always get 1 GPU that is effectively not really used; its memory use is consistent with just loading the model.

# to ray data
ray_ds = ray.data.from_pandas(df).repartition(396)

# convert 
ray_ds = ray_ds.map(preprocess_function)

# preds
predictions = ray_ds.map_batches(HuggingFacePredictor, num_gpus=1, batch_size=args.batch_size, concurrency=args.num_devices)

Here is the output, where the amount of data being processed decreases over time, but the blocks hangs:

Running: 0.0/384.0 CPU, 16.0/16.0 GPU, 29.59 GiB/282.72 GiB object_store_memory:   0%|          | 1/396 [1:56:33<752:37:05, 6859.31s/it]
Running: 0.0/384.0 CPU, 16.0/16.0 GPU, 29.52 GiB/282.72 GiB object_store_memory:   0%|          | 1/396 [1:56:33<752:37:05, 6859.31s/it]

Also I am unclear why one of my GPUs is poorly saturated:

localh · February 29, 2024, 7:08pm

Downgraded to Ray 2.7.0 and this issue does not exist.

Sam_Chan · March 12, 2024, 8:48pm

Thanks for reporting this localh; can you create a gh ticket on ray and attach a repro script? I’ll get our relevant on-calls to take a look.

Topic		Replies	Views
Single node, 4x GPU, map_batches only using 1 Ray Data	3	712	October 5, 2023
How to auto assign actors to different GPUs in ray.data.map_batches Ray Data	2	55	November 26, 2024
Ray inferencing not happening in streaming way	7	388	December 13, 2023
[Data] map_batches is not respecting concurrency from the beginning	1	209	December 6, 2024
Dataset support concurrency in one block when using map_batches	4	708	October 1, 2022

Ray 2.9.3: map_batches and multi-gpu -- not processing partition blocks / evenly sharding

Related topics