Ray normal DAG vs Compiled DAG

RunLLM · October 18, 2025, 1:51am

In your described setup, the output of the PreprocessingActor (on node 1, possibly using GPU) is passed to the InferenceActor (which may be on the same or a different node). By default, Ray serializes the output (even if it’s a torch.Tensor on GPU) to CPU memory and stores it in the object store. When the InferenceActor receives the data, it is deserialized from CPU memory and, if needed, copied to GPU. So, the actor-actor communication is CPU-CPU by default, not direct GPU-GPU—even if both actors are on the same node and both use GPUs (Ray discuss: ray.get on torch cuda tensors).

Direct GPU-GPU actor-actor communication (zero-copy, NCCL, etc.) is only available with Ray Compiled Graph or Ray Direct Transport features, and requires explicit configuration. Otherwise, all inter-actor data transfer goes through CPU/object store.

Would you like to see how to enable direct GPU-GPU communication between actors?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Ray Compiled Graphs	2	51	October 14, 2025
Ray inferencing not happening in streaming way	7	431	December 13, 2023
Offline inference vLLM: map_batches vs build_llm_processor	30	74	December 12, 2025
[Core] Question on optimizing machine learning project speed using ray Ray Core	5	504	February 1, 2022
vLLM Inferencing on multiGPU Ray Serve	7	1381	September 24, 2024

Ray normal DAG vs Compiled DAG

Related topics