I have a function that I want to apply with map_batches to create a weight vector:
def class_weight(batch):
classes = np.array([0.06159456, 4.55880019])
labels = batch['label'].values
labels = labels.astype(np.int8)
# create weight vector
samples_weight = classes[labels]
return {"weight": samples_weight}
Initially, map_batches will use a fair amount of the CPUs but eventually it will go to zero, reporting statistics like:
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 30%|███ | 192/640 [12:24<13:58, 1.87s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 30%|███ | 193/640 [12:26<13:55, 1.87s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 30%|███ | 194/640 [12:28<13:47, 1.86s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 30%|███ | 195/640 [12:33<19:39, 2.65s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 31%|███ | 196/640 [12:37<23:55, 3.23s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 31%|███ | 197/640 [12:42<26:41, 3.61s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 31%|███ | 198/640 [12:46<28:32, 3.88s/it]
Running: 0.0/160.0 CPU, 0.0/32.0 GPU, 685.19 MiB/279.4 GiB object_store_memory: 100%|██████████| 198/198 [12:51<00:00, 3.88s/it]
My interpretation of the logs is that Ray believes the operation is trivial, using only 685mb of available 280 GB; thus it doesnt need to use many CPUs? How can I convince Ray otherwise?