Is it possible to run inference on local GPU as well as rollout CPU workers?

For my RL task, the inference is much more efficient on GPU. However, i still have unused CPU cores I could put towards additional inference for the envs. Is this possible to do? Does anyone have an example?

Yes, it is possible to utilize both CPU and GPU for inference in Ray. Ray is designed to simplify the development of distributed applications, making it easier to scale your code from a single machine to a cluster of machines seamlessly.

In Ray, you can specify the number of CPUs and GPUs each task or actor requires. Ray then schedules tasks on nodes in the cluster that have enough of the requested resources available. This allows you to utilize both CPU and GPU resources for your tasks.

Here’s an example of how you can specify resource requirements in Ray:

@ray.remote(num_cpus=2, num_gpus=1)
def f():
    return 1

In this example, the function f is decorated with @ray.remote(num_cpus=2, num_gpus=1), which means each invocation of f will require 2 CPUs and 1 GPU.

However, please note that while you can use both CPU and GPU for inference, the actual implementation would depend on your specific use case and the ML framework you’re using. Some ML frameworks might not support running on both CPU and GPU simultaneously.

For more information on how to use resources in Ray, you can refer to the Ray documentation.