Guidance Needed on Structuring Ray Workflows for Complex ML Pipelines

Daisy_N · June 24, 2025, 4:17pm

Hi everyone,

I’m currently working on setting up a distributed ML pipeline using Ray, and I could use some guidance on structuring things efficiently using Ray Workflows.

The use case involves multiple stages: data preprocessing, model training (potentially across several model variants), evaluation, and then deploying the best model. Each of these stages can be broken down into smaller, reusable components that ideally should run asynchronously and in parallel wherever possible. I’ve already used Ray’s Actors and remote functions for some distributed tasks before, but I’m now exploring Ray Workflows to get better orchestration, checkpointing, and DAG visibility.

Here are a few questions I have:

What are best practices when building large-scale DAGs with nested dependencies in Ray Workflows?
How should I handle intermediate data between steps (e.g., large preprocessed datasets) — is it advisable to persist these in external storage or use Ray’s object store?
For versioning and tracking outputs (like models or metrics), do you integrate with external tools or stick with Ray’s metadata APIs?
Also, are there examples of modular or templated workflows that support reusability across different ML experiments?

If anyone has experience running production-grade ML pipelines with Ray Workflows, I’d love to hear about the challenges you faced and how you structured your mulesoft training in hyderabad workflows.

Thanks in advance!

Topic		Replies	Views
Multi-stage fanning pipeline using Ray: Queues + Actors vs. Workflows Ray Core	3	940	April 22, 2022
Non ml ray uses Ray Core	7	1170	February 3, 2021
Is a built-in job scheduler in the works? Ray Workflows	1	684	January 1, 2022
Mlflow log keras model with strategy MultiWorkerMirroredStrategy Ray Train	1	437	April 4, 2022
Can "ray workflows" be used in production environments? Ray Workflows	0	629	January 4, 2022

Guidance Needed on Structuring Ray Workflows for Complex ML Pipelines

Related topics