[Roadmap] Ray Q3 2025

christina · July 25, 2025, 6:02pm

Hello everyone! I’m excited to share what we have in plan for Q3 2025 for Ray. I will try to keep this updated as features get merged in, and rolled out.

Goal: Deliver foundational reliability, performance, and DX improvements across Ray Core, Data, Train, LLM, Serve, RL, Observability, Technical Content, and KubeRay.

Ray Core

Reliability & Fault Tolerance

Improve system stability under node and network failures,
- Including making RPCs tolerant to transient errors
Add robust support for preemptible instances

Scheduling & Performance

Introduce label-based scheduling for finer-grained resource control.
Implement GPU objects with RDMA transfer support for high-performance GPU data handling

Developer Experience

Introduce ActorMesh for simplified interaction with groups of actors
Improve static typing across the codebase to enhance developer productivity
Address outstanding technical debt in core worker components

Ecosystem Integrations

Provide official support for reinforcement learning libraries like veRL, OpenRLHF, and ROLL.

Ray Data

Reliability

Ensure workloads complete successfully despite cluster failures

Performance

Enhance training ingest pipelines with advanced sampling and caching support

Connectors

Improve Apache Iceberg integration
Expand data catalog support, starting with Databricks Unity Catalog

Usability

Schema UDFs
Enhanced internal query planning

Ray Train

API

Finalize Train v2 API

Performance

Implement asynchronous checkpointing

LLM

Goal: Run large models (e.g. DeepSeek) at scale via vLLM on Ray Serve:

Prefill diaggregation
Large scale DP
Custom request routing
Elastic expert parallelism

Performance & Efficiency

Implement prefill disaggregation to optimize performance for large context models.
Develop an intelligent, KV cache-aware router with a pluggable architecture
Implement Distributed Parallel (DP) Attention within Ray Serve

Operations

Publish updated performance benchmarks

Ecosystem

Support SkyRL for reinforcement learning for human feedback (RLHF) workloads

Ray Serve

Serving Flexibility

Custom auto‑scaling and routing patterns
Async inference support
MCP server patterns
Integrate label based scheduling

Observability

Enhanced tracing support

RLlib

Ray RL V2 stack GA
Algorithm composability enhancements

Observability

API Release

Public launch of unified event export API

Optimization

Refactor internals to leverage new export API

Technical Content

New technical templates
More examples & deep‑dives

KubeRay

Upgrades

Productionize the incremental upgrade feature for seamless cluster updates

Hardware Support

Streamline support for diverse accelerators, including multiple GPU types, Dynamic Resource Allocation (DRA), and MIG

Autoscaling

Continue to improve the functionality and reliability of Autoscaler V2

We love hearing from the community! If there is a feature you’d like to see in Ray in the future, let us know by filing a feature request or comment here. We also have a discussion on the roadmap on GitHub if you prefer to chat there.
Thank you for supporting Ray!

Topic		Replies	Views
Ray 2.7 Released. Check it out! Announcements	0	569	September 18, 2023
Announcing Ray 2.3.0: performance improvements, new features and new platforms Announcements	0	623	February 24, 2023
Ray Serve LLM APIs has 2~3x higher latency Ray Serve LLM APIs	7	359	May 19, 2025
About the Ray Data LLM APIs category Ray Data LLM APIs	0	44	April 2, 2025
Ray Serve: Ray Serve vs Regular Web server Performance? Ray Serve	2	1311	January 5, 2022