I have a custom graphs with a lot of nodes to calc.

But I have a limited memory like 1T, 2T.

It returns memory limited error everytime when I run the tasks.

Is there a common way to solve this problem--------calc a big graph with limited memory fastly.

I have tried to partition the big graph into small graphs, but a lot of nodes will be calced a lot of times if you don`t divided the graph good.

If the solution of this problem is parttion, how could I divide the graph suitably.

If not , how could I solve this problem.

Is there any paper about this?

my codes is too long, I would like to make up a instance.

Here is a dask graph:

dsk = { ‘a1’ : pd.DataFrame, ‘a2’ : (add , ‘a1’, pd.DataFrame), ‘a3’ : (add, ‘a2’, pd.DataFrame), (‘a4’ : (add, ‘a3’, ‘a2’) … large tasks to ‘ai’ }

I init ray with 1T memory in single machine.

I need the all the a1 a2 … ai.

so I run the graph with ray_dask_get(dsk, [a1, a2… ai])

when dask schedule the graph, this happened:

a1 is done, but a1 has downstreams, so a1 is still in ray memory, not clean

a2 is done, but a1 a2 has downsteams, so a1 a2 is stll in ray memory, not clean

a3 is done, and no downstreams, clean a3. a1 a2 is still in ray memory.

so here is {ai … aj} in ray memory,and need a new one put in to ray memory, then return a error, no space left on device.

I want to know how to solve this problem in a common way.