Explicit caching/memoization between runs

From what I can tell, ray doesn’t make any attempt to persist results in the object store between workflow runs, so if I run a workflow once, then once again later with the code and config unchanged, the whole thing will run from scratch again. Is there any way to get ray to persist job outputs and/or the entire object store, and to check the store for completed tasks? I have some long-running processes I want to run, and re-running all tasks whenever I tweak a function is not viable for me.

It seems that ray already has a caching mechanism, which is what drives the “fault tolerance” stuff, but it just chooses to drop all this data after Python exits instead of providing it for later use.

Apparently this is not supported or planned: Don't rerun same task with same arguments if result is already available · Issue #2620 · ray-project/ray · GitHub

Object persistence is not available for normal Ray tasks, but you can try using Ray workflows: Workflows: Fast, Durable Application Flows — Ray v2.0.0.dev0

which does offer durable logging of intermediate workflow objects, across different Ray instances.

Great answer, thank you! I guess it’s this exactly once feature that is relevant here.