Explicit caching/memoization between runs

Migwell · September 29, 2021, 1:44am

From what I can tell, ray doesn’t make any attempt to persist results in the object store between workflow runs, so if I run a workflow once, then once again later with the code and config unchanged, the whole thing will run from scratch again. Is there any way to get ray to persist job outputs and/or the entire object store, and to check the store for completed tasks? I have some long-running processes I want to run, and re-running all tasks whenever I tweak a function is not viable for me.

It seems that ray already has a caching mechanism, which is what drives the “fault tolerance” stuff, but it just chooses to drop all this data after Python exits instead of providing it for later use.

Migwell · September 29, 2021, 2:01am

Apparently this is not supported or planned: Don't rerun same task with same arguments if result is already available · Issue #2620 · ray-project/ray · GitHub

ericl · September 29, 2021, 3:24am

Object persistence is not available for normal Ray tasks, but you can try using Ray workflows: Workflows: Fast, Durable Application Flows — Ray v2.0.0.dev0

which does offer durable logging of intermediate workflow objects, across different Ray instances.

Migwell · September 29, 2021, 4:14am

Great answer, thank you! I guess it’s this exactly once feature that is relevant here.

Topic		Replies	Views
[Core] Having trouble evicting objects Ray Core	6	563	June 9, 2021
[Workflows][DAG] Are node results/states cached? Ray Workflows	2	475	March 9, 2023
Proper workflow for keeping Ray memory clean and separating returned python objects from their Ray references Ray Core	6	3361	May 11, 2022
Object Storage Management with Ray Actor Tasks that does not need to be saved once executed Ray Core	4	253	February 28, 2024
ObjectLostError Ray Clusters	4	387	July 15, 2021

Explicit caching/memoization between runs

Related topics