Memory sharing with nested list

hbodory · June 24, 2021, 11:17am

In our ML project, we assign a large non-standard Python object (nested list) to processes in parallel. The problem is that this nested list (e.g. 3 GB) is first loaded into the Ray object store and then it will be copied for each child process, i.e. memory sharing not does not work between object store and child processes. Below you find a toy example. Our users work on Windows machines. Can you please help me to find a solution to this problem, which often leads to program abortion because we run out of memory?

# -*- coding: utf-8 -*-
"""
Created on Wed Mar 10 15:25:36 2021.

@author: HBodory
"""

import numpy as np
import ray
import time
from scipy.sparse import lil_matrix


@ray.remote
def toy_function(x, nested_list, sparse_matrix):
    """Test."""
    idx = np.array(list(range(30)))
    data = np.array([i / 1 for i in range(30)], dtype="float32")
    sparse_matrix[0, idx] = data
    sparse_matrix[1, 30:60] = sparse_matrix[0, :30]
    for i in range(1 * 10**2):
        m1 = np.random.random((100, 100))
        matrix = m1 @ np.transpose(m1)
        sol = np.linalg.inv(matrix)
        c_sparse = sparse_matrix[0, 2]
        result = x + nested_list[0][0]
    return result, sol, c_sparse


if __name__ == "__main__":

    N = 10**7

    ray.init(num_cpus=4, include_dashboard=False)

    # Create a nested list
    nested_list = [list(range(N)), list(range(2*N))]
    nested_list_id = ray.put(nested_list)

    sparse_matrix = lil_matrix((N, N), dtype=np.float32)
    sparse_matrix[0, :30] = range(30)
    sparse_matrix[1, 30:60] = sparse_matrix[0, :30]
    sparse_matrix = ray.put(sparse_matrix)

    t0 = time.time()
    result = ray.get([toy_function.remote(i, nested_list_id, sparse_matrix)
                      for i in range(10)])
    t1 = time.time() - t0
    print(f"Execution time: {t1} sec.")
    ray.shutdown()

architkulkarni · June 25, 2021, 7:03pm

Hi @hbodory, thanks for the question! I’m not sure what’s causing this issue, as your code sample seems to be following the recommendation at Tips for first-time users — Ray v2.0.0.dev0 to the letter. Is it possible for you to use a numpy array or collection of numpy arrays instead of a nested list? That may save on deserialization costs in the workers: Ray Core Walkthrough — Ray v2.0.0.dev0

@Stephanie_Wang any ideas about this?

sangcho · June 25, 2021, 9:58pm

If you are talking about zero-copy read, it only works on data structure that supports pickle 5 protocol (e.g., numpy array). I believe the scipy matrix probalby uses the numpy under the hood, but for the python list, it doesn’t work (it is copied to the process memory when it is used).

hbodory · June 26, 2021, 3:00pm

We thought about using objects that support the pickle 5 protocol for zero-copy read. But if we used, for example, numpy arrays instead of the single nested list object, we would have to generate tens of thousends or even hundreds of thousends numpy arrays, depending on the data size, number of trees, leaf sizes, and so on. Is there a way to handle such a large collection of numpy arrays in a similar (simple) fashion like a single nested list?

hbodory · June 26, 2021, 3:06pm

Dear @architkulkarni, thanks for you recommendation regarding the use of numpy arrays. We thought about using objects that support the pickle 5 protocol for zero-copy read. But if we used, for example, numpy arrays instead of the single nested list object, we would have to generate tens of thousends or even hundreds of thousends numpy arrays, depending on the data size, number of trees, leaf sizes, and so on. Is there a way to handle such a large collection of numpy arrays in a similar (simple) fashion like a single nested list?

hbodory · June 26, 2021, 3:08pm

Dear @sangcho, thank you very much for your reply. Please see my answer sent to @architkulkarni.

sangcho · June 26, 2021, 8:58pm

I think if you are storing the numpy array in a nested list, only the list part is copied (and the numpy buffer itself would be zero-copy read). @suquark can you confirm this?

suquark · June 27, 2021, 2:17am

Yes, only the list part is copied

hbodory · June 28, 2021, 9:11am

Dear @sangcho, I ran some small tests, you are right, storing numpy arrays in a nested list can lead to improvements in several dimensions, e.g. (i) requiring less memory for the nested list itself, (ii) saving RAM for the child processes because of zero-copy reading, and (iii) reducing the run time. Thank you very much for your valuable and helpful support.

sangcho · June 30, 2021, 5:19am

Glad it helped!! And thanks for the verification :)!!

Topic		Replies	Views
[Core] How to share memory with non-numpy object? Ray Core	6	1408	May 7, 2021
@ray.remote function seemingly copying data from plasma store Ray Core	10	1108	March 27, 2021
Simultaneous numpy matrix vector multiplication Ray Core	4	766	November 28, 2022
Nested list output in parallelized execution	1	105	April 25, 2024
Shared numpy array Ray Core	1	1273	July 19, 2022

Memory sharing with nested list

Related topics