NUMPY database?

vsraptor · August 11, 2022, 2:55am

PS>> I just started reading about HDF5, which seems to answer most of the questions …

How do I create a LARGE 2D numpy array that has the following specs :

1. Can do DOT product i.e. np.dot(vector,ary2d)
2. CAN use ary2d[rows,cols] syntax to update values
3. CAN resize the array
4. CAN access it by multiple Actors/tasks

My idea so far is ::

To have some sort of Server/daemon app that forks multiple processes.
Split the array to chunks, so that resizing the array is simply adding a new chunk.
Then applying a DOT product for example is iteratively applying it to every chunk and combining the result.

I couldnt find a way in Ray datasets or Apache arrow docs how to UPDATE the numpy array f.e. chunk3[45,:] = vec

Does RAY handle locking of the access to the chunks or i have to do it manually ?

Should I use something like HDF5 instead of Arrow (which has array not np.array … need the np.dot/fft and cython) ?

Sorry for the multi directional questions, if I have to put succinctly what I need is a NUMPY DATABASE.

All projects I’VE checked so far dask,vaex,pytables,arrow and possibly ray dataset seem to be NON-UPDATABLE, NON-RESIZABLE and SINGLE-CLIENT ACCESS projects.

If you can comment on any of the topics with : example, link to docs to read.

I’ve read most of Ray core and Dataset docs and have done some non trivial experiments but the bottleneck is the serial access to a numpy array (and a python dict)
The multi-Actor app was ~3 times slower.

My hope is by chunking the array to allow multi-access and implement resizing (i use np.append() currently)

Huaiwei_Sun · August 26, 2022, 5:09am

cc: @Clark_Zinzow @jianxiao

Topic		Replies	Views
Fastest way to share numpy arrays between ray actors and main process Ray Core	3	691	April 13, 2023
Create Ray dataset from numy array Ray Core	1	350	April 12, 2022
Converting dask dataframe/array to ray dataset Ray Data	3	799	April 18, 2022
Working with a large dataset Ray Core	2	1077	December 16, 2021
How to modify numpy elements via Ray Ray Core	1	443	March 8, 2023

NUMPY database?

Related topics