How to keep the data in GPU memory after remote call?

I have a large data array on GPU memory, and I would like to replace one row of this data while keeping all the other rows unchanged at each call of the remote function. The GPU kernel is a Ray remote function. However, I find that after calling the remote function, the data array will be set zero on GPU memory, and the previously replaced rows will be all zero.

import ray
from numba import cuda
import numpy as np


copy newly calculated ‘new’ to ‘temp’ (on GPU memory) in row ii

def copy_(length, new, temp, ii):

i = cuda.grid(1)

# loop through each spatial grid
if i < length:
    temp[ii,i] = new[i]

def solver(length, new, ii):

temp = cuda.device_array([5,length])
new = cuda.to_device(new)

# Configure the blocks
threadsperblock = 32

# configure the grids
blockspergrid = (length + (threadsperblock - 1)) // threadsperblock

copy_[blockspergrid, threadsperblock](length, new, temp, ii)

return temp.copy_to_host()

length = 10

new = np.arange(length)

specify the index to replace, if next time ii = 1, the copied ii=0 row will be zero

ii = 0
re = solver.remote(length, new, ii)

The above code will reset the data array. I want the array ‘temp’ to stay on the GPU memory.

If instead, I feed the data array ‘temp’ as an input in remote function, the data array cannot be pickled.

Is there a way such that this can be done?

cc @Clark_Zinzow @ericl