Increase efficiency using PyTorch + GPU for inference

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Hi guys,

I am new to Ray and trying to setup a cluster to perform some OCR tasks using easyocr
I wanted to speed up the current codebase with multiple actors defined and are sequentially reading images from local and OCRed remotely in batches of 8 images.

It takes around 30 minuts to complete 1000 images, which is quite slow. I wonder if there are any tips speeding this up? Like using async or multiprocess locally to start actor pools

I have the following cluster setup

Hi @Chester_Cheng , I think you’re looking for image batch prediction on GPU. This is one typical workload that Ray’s AI Runtime (AIR) is targeting, for your case in particular, it focuses on

  1. Fast and parallelized IO from storage like disk / s3, etc
  2. Scalable data pre-processing and augmentation
  3. Easy integration and API with training / prediction with DL framework of your choice.

You can try a script like our benchmark - ray-project/ray - Sourcegraph on ray nightly wheels, and as of today our benchmarks [AIR][CUJ] Make distributing training benchmark at silver tier by jiaodong · Pull Request #26640 · ray-project/ray · GitHub

* - **Cluster Setup**
  - **Data Size**
  - **Performance**
  - **Command**
* - 1 g3.8xlarge node
  - 1 GB (1623 images)
  - 72.59 s (22.3 images/sec)
  - `python --data-size-gb=1`
* - 1 g3.8xlarge node
  - 20 GB (32460 images)
  - 1213.48 s (26.76 images/sec)
  - `python --data-size-gb=20`
* - 4 g3.16xlarge nodes
  - 100 GB (162300 images)
  - 885.98 s (183.19 images/sec)
  - `python --data-size-gb=100`