Serve huggingface transformer on GPU with batching

Hi, sorry for the confusion here! You don’t need to specify ray.remote when using Serve deployments, our @serve.deployment wrapper handles that for you.

Check here for how to specify GPUs: Core API: Deployments — Ray 1.12.0

@serve.deployment(name="deployment1", ray_actor_options={"num_gpus": 0.5})
def func(*args):
    return do_something_with_my_gpu()

And does the following batching tutorial help for the batching question? Batching Tutorial — Ray 1.12.0

1 Like