How to auto assign actors to different GPUs in ray.data.map_batches

If I have two gpus on a machine, I expect the following code to run on both gpus, but only one gpu is actually running. How to auto assign actors to multiple GPUs?

    ray.init()

    class TorchPredictor:

        def __init__(self):
            self.model = torch.nn.Identity().cuda()
            self.model.eval()

        def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
            inputs = torch.as_tensor(batch["data"], dtype=torch.float32).cuda()
            with torch.inference_mode():
                batch["output"] = self.model(inputs).detach().cpu().numpy()
            return batch

    ds = ray.data.from_numpy(np.ones((32000, 10000))).map_batches(
            TorchPredictor,
            concurrency=2,
            batch_size=4,
            num_gpus=1
            )

Hey @Cathy0908,

What happens if you add a repartition(2) after from_numpy? I suspect there aren’t enough partitions (“blocks”) to use all the GPUs.

1 Like

It works, thanks very much!