How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi guys,
I am new to Ray and trying to setup a cluster to perform some OCR tasks using easyocr
I wanted to speed up the current codebase with multiple actors defined and are sequentially reading images from local and OCRed remotely in batches of 8 images.
It takes around 30 minuts to complete 1000 images, which is quite slow. I wonder if there are any tips speeding this up? Like using async or multiprocess locally to start actor pools
I have the following cluster setup

Hi @Chester_Cheng , I think you’re looking for image batch prediction on GPU. This is one typical workload that Ray’s AI Runtime (AIR) is targeting, for your case in particular, it focuses on
- Fast and parallelized IO from storage like disk / s3, etc
- Scalable data pre-processing and augmentation
- Easy integration and API with training / prediction with DL framework of your choice.
You can try a script like our benchmark gpu_batch_prediction.py - ray-project/ray - Sourcegraph on ray nightly wheels, and as of today our benchmarks [AIR][CUJ] Make distributing training benchmark at silver tier by jiaodong · Pull Request #26640 · ray-project/ray · GitHub
* - **Cluster Setup**
- **Data Size**
- **Performance**
- **Command**
* - 1 g3.8xlarge node
- 1 GB (1623 images)
- 72.59 s (22.3 images/sec)
- `python gpu_batch_prediction.py --data-size-gb=1`
* - 1 g3.8xlarge node
- 20 GB (32460 images)
- 1213.48 s (26.76 images/sec)
- `python gpu_batch_prediction.py --data-size-gb=20`
* - 4 g3.16xlarge nodes
- 100 GB (162300 images)
- 885.98 s (183.19 images/sec)
- `python gpu_batch_prediction.py --data-size-gb=100`