Seeking feedback on this Ray loop

Hi all.

I am trying to parallelly apply CRFs to a bunch of images I have.

Here’s how I have coded it:

from pydensecrf.utils import unary_from_labels, create_pairwise_bilateral
import pydensecrf.densecrf as dcrf
from skimage.color import gray2rgb

from tqdm import tqdm_notebook
import numpy as np
import psutil
import ray

num_cpus = psutil.cpu_count(logical=False)
ray.init(num_cpus=num_cpus)

# Reference:
# https://www.kaggle.com/meaninglesslives/apply-crf
@ray.remote
def custom_crf(mask_img, shape=(256, 256)):
    
    # Converting annotated image to RGB if it is Gray scale
    if(len(mask_img.shape)<3):
        mask_img = gray2rgb(mask_img)
        
    # Converting the annotations RGB color to single 32 bit integer
    annotated_label = mask_img[:,:,0] + (mask_img[:,:,1]<<8) + (mask_img[:,:,2]<<16)
    
    # Convert the 32bit integer color to 0,1, 2, ... labels.
    colors, labels = np.unique(annotated_label, return_inverse=True)

    n_labels = 2
    
    # Setting up the CRF model
    d = dcrf.DenseCRF2D(shape[1], shape[0], n_labels)

    # get unary potentials (neg log probability)
    U = unary_from_labels(labels, n_labels, gt_prob=0.7, zero_unsure=False)
    d.setUnaryEnergy(U)

    # This adds the color-independent term, features are the locations only.
    d.addPairwiseGaussian(sxy=(3, 3), compat=3, kernel=dcrf.DIAG_KERNEL,
                      normalization=dcrf.NORMALIZE_SYMMETRIC)
        
    # Run Inference for 10 steps 
    Q = d.inference(10)

    # Find out the most probable class for each pixel.
    MAP = np.argmax(Q, axis=0)

    return MAP.reshape((shape[0], shape[1]))

dummy_images = np.zeros((12348, 256, 256))

crfs = []
for image in tqdm_notebook(dummy_images):
    image_id = ray.put(image)
    crfs.append(ray.get(custom_crf.remote(image_id)))

Important links:

I am doubtful about this implementation since I haven’t noticed any considerable speedups compared sequential single-thred Python. Am I missing out on something?

Hi @sayakpaul,
The issue is in this line crfs.append(ray.get(custom_crf.remote(image_id)))

You are basically telling ray to do one image at a time.
The key code smell here is the remote() call followed immediately by a get. This pattern gives you synchronous behavior.

This should give you more parallelism

crf_ids = [] 
for image in tqdm_notebook(dummy_images):
    image_id = ray.put(image)
    crf_ids.append(custom_crf.remote(image_id))
crfs = ray.get(crf_ids)
``

Thank you!

This part now becomes a bit time-consuming. Anything that I could improve further?

The three obvious ones are too make custom_crf faster, add a batch dimension to it, or add more workers/nodes to rllib. Maybe someone else has another suggestion.

1 Like

Thank you.

I can confirm that increasing the number of workers was definitely helpful.