Using ray with transformers pipeline for inference

Hi

I am trying to use ray to speed up inference, can you please help me with an example ?

I am using the zero shot classification pipeline. My dataset has 400K sentences and 52 labels.

%%time
import ray
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="typeform/distilbert-base-uncased-mnli")

ray.init()
test_name_lst_ray = ray.put(test_name_lst)
classifier_ref = ray.put(classifier)

@ray.remote
def get_label_score(classifier, sequence,i, labels, n=1):
    seq = sequence[i]
    res_dict = classifier(seq, labels)
    return res_dict

res_list = ray.get([get_label_score.remote(classifier_ref, test_name_lst_ray, i, tag_values) for i in range(100)])

I am running above code for 100 sentences but the timing is same as when I run using a loop in fact it adds about 15 seconds.

When I run on a gpu, it also takes the same time. Below is the code for gpu.

%%time
import ray
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="typeform/distilbert-base-uncased-mnli", device=0)

ray.init()
test_name_lst_ray = ray.put(test_name_lst)
classifier_ref = ray.put(classifier)

@ray.remote(num_gpus=1)
def get_label_score(classifier, sequence,i, labels, n=1):
    seq = sequence[i]
    res_dict = classifier(seq, labels)
    return res_dict

res_list = ray.get([get_label_score.remote(classifier_ref, test_name_lst_ray, i, tag_values) for i in range(100)])

I have access to CPU with 16vcpu and 32gb ram as well as machine with 2 gpu each 12gb ram.

Please assist on how I can use ray to parallelise and speed up inference for my large dataset.

Regards,
Subham