Context
This is a continuation of Simultaneous numpy matrix vector multiplication .
Basically I have one node with a large matrix in stored in shared memory. This node has 20 CPUs and each CPU has a different set of vectors, and they all multiply the large matrix.
Question
The response to the question linked mentioned that all the CPUs can simultaneously access the large matrix (rather than having one CPU use the matrix, then the next one, etc.). Out of curiosity, how does this happen?
More context
For example, does Ray tell each CPU to work on different parts of the matrix, so that each part of the large matrix is only being accessed by one CPU at any point in time (like CPU 1 first multiples the first row and the vector in CPU 1, CPU 2 first multiples the second row and the vector in CPU 2, etc., then CPU 1 multiplies the second row by the vector in CPU 1, CPU 2 multiplies the third row and the vector in CPU 2, etc.?
Or do all CPUs basically access the same element at the same time (i.e., all CPUs do first row times their vector, then second row times their vector, etc.)?