Here grids refers to each location of my data. I have 3D array. I have a 3D array. I am getting this error only when I am running the code with ray specially I am selecting a arraytime,lat,lon and processing it with considering chunk of lat=2, lon=2. When I ran the code on array[:,4,4] with chunk of lat=2 and lon=2.
My code looks like in the following structure.
ray.init(ignore_reinit_error=True)
RAY_memory_monitor_refresh_ms = 0
processing steps
def chunk_indices(array_length, chunk_size):
return [range(i, min(i + chunk_size, array_length)) for i in range(0, array_length, chunk_size)]
def process_chunks(lat_chunks, lon_chunks, sst_train, sst_test, alpha, tau, K, DC, init, tol, look_back, lead, l, spi):
futures = []
for lat_chunk in lat_chunks:
for lon_chunk in lon_chunks:
futures.append(process_grid_cell.remote(
lat_chunk[0], lat_chunk[-1] + 1,
lon_chunk[0], lon_chunk[-1] + 1,
sst_train, sst_test, alpha, tau, K, DC, init, tol, look_back, lead, l, spi
))
results = ray.get(futures)
return results
Here is the repro
(process_grid_cell pid=187040) WARNING:tensorflow:5 out of the last 3655 calls to <function TensorFlowTrainer.make_predict_function..one_step_on_data_distributed at 0x7f3204481940> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to Better performance with tf.function | TensorFlow Core and tf.function | TensorFlow v2.16.1 for more details. [repeated 3x across cluster]
(process_grid_cell pid=187034) 2024-08-29 07:04:38.792387: W tensorflow/core/data/root_dataset.cc:362] Optimization loop failed: CANCELLED: Operation was cancelled
(process_grid_cell pid=187075) 2024-08-29 07:07:09.375367: W tensorflow/core/data/root_dataset.cc:362] Optimization loop failed: CANCELLED: Operation was cancelled
(process_grid_cell pid=190332) 15 14
(process_grid_cell pid=190504) 13 14
(process_grid_cell pid=187082) 17 14
(process_grid_cell pid=186968) 15 10
(process_grid_cell pid=191814) 3 2
(process_grid_cell pid=187060) 17 12
(process_grid_cell pid=187044) 19 14
(process_grid_cell pid=191003) 9 16
(process_grid_cell pid=190477) 15 6
(process_grid_cell pid=187015) 19 16
(process_grid_cell pid=187080) 17 18
(process_grid_cell pid=191528) 7 0