Simultaneous numpy matrix vector multiplication

xzf0kgb0bqr.cev2RWU · November 24, 2022, 12:18pm

- Low: It annoys or frustrates me for a moment.

What I want to do

I have an m x n numpy array A where m << n that I want to load on a node where all 20 CPUs on that node can share memory. On each CPU, I want to multiply A by a n x 1 vector v, where the vector v differs on each CPU but matrix A stays the same.

Constraint

Matrix A is sufficiently large so that I cannot load A on each CPU, so I would like to put A in shared node memory. And since A*v is just m x 1, I think I never need to store matrices of size m x n on each CPU (just one copy of A in shared memory).

My question

If I have 1 worker per CPU, can each worker simultaneously compute A x v (where v is different for each worker) using Ray?

I am concerned that since I am simultaneously accessing the same shared memory by each worker, Ray would copy matrix A into each CPU, which would cause memory problems for me.

Note
I had previously asked this question on StackOverflow. I have amended the question on StackOverflow, removing the part about Ray, because I just discovered that Ray has its own forum here.

sangcho · November 24, 2022, 1:03pm

If A is integer based numpy array, it is zero copied to each worker. Serialization — Ray 3.0.0.dev0

Also note that A is immutable if you are putting it to a shared memory using ray.put!

xzf0kgb0bqr.cev2RWU · November 27, 2022, 12:27am

Thanks!

i. Just to be clear, zero copy means that I don’t need to store n_CPU x size of A in memory over the whole node (I just need to store 1 copy of A)?

ii. Could you explain the integer part a bit more? I did not see a reference to integers in the link you mentioned. Additionally, the numpy array I will be using consists of floats (not just integers) – would that work as well?

iii. Could you explain the part about serialization? My goal is to simultaneously access A across all workers (at the same time); serialization makes me think that one worker uses A for matrix vector multiplication, then the next worker uses A, etc., which is what I am hoping to avoid with Ray. Can I access a numpy float array A and use it for matrix vector multiplication simultaneously across all workers?

Thanks a lot for all the developments on Ray! It seems like an awesome package.

Stephanie_Wang · November 28, 2022, 9:02pm

That’s right! If you are finding that this is not the case, i.e. you’re running out of memory or the memory usage is much higher than size of A + (n * size of v), please report back here with your code as this is not expected.

Yes, to clarify, zero-copy serialization should work for numpy in general as long as you are using primitive dtypes (ints, floats, bytes, etc). It does not work for the 'O'type, since these are arbitrary Python objects.

Yes, in the following code, the workers will not need to copy A when they receive it as a task argument because they receive a pointer to the numpy array stored in shared memory.

import ray

@ray.remote
def multiply(A, v):
    return A * v

A_ref = ray.put(A)  # Put A in Ray's shared-memory object store.
refs = [multiply.remote(A_ref, v) for v in vs]
results = ray.get(refs)

The one caveat is that the copy of A is immutable, so if you need to make a fine-grained update, this will produce multiple distinct copies of A, one per task:

@ray.remote
def update(A, i, x):
  A[i] = x
  return A  # This is a distinct object from the original A.

xzf0kgb0bqr.cev2RWU · November 28, 2022, 11:28pm

Thank you! And thank you all for the awesome functionalities of Ray!

Topic		Replies	Views
How does Ray do simultaneous matrix-vector multiplications? Ray Core	1	366	December 21, 2022
Multiple Ray instances on one node accessing shared memory Ray Core	2	908	November 30, 2022
[Core] How to share memory with non-numpy object? Ray Core	6	1322	May 7, 2021
Does Ray accelerate matrix multiplication? Ray Core	2	687	December 8, 2022
Shared numpy array Ray Core	1	1216	July 19, 2022

Simultaneous numpy matrix vector multiplication

Related topics