I’ve been experimenting with Ray for my program which generates solutions using an NSGAII algorithm. In this program I created a custom evaluator for my use case which I’ve parallelized using ray. I am doing this run sequentially by iterating over a number of categories (i.e. the GA runs at a lower level) where each run has 20 generations and I am able to do a few iterations but never the full thing. It seems that the objects are never fully evicted after the iteration is finished.
results =  for i in range(len(chromosomes)): trial_idx = int(n_gen*population_size) + i offspring[i].trial_idx = trial_idx results.append(promos_in_chromosome.remote(self.predictor,chromosomes[i],business_logic_dict_id,pickle_manager_id, trial_idx, cached_id, delimiter_id, category_id, n_gen)) chromosomes,evaluation,updates_pgScanCaseRate,cache = zip(*ray.get([result for result in results])) # do updates del results del cache del cached_id del chromosomes, evaluation, updates_pgScanCaseRate
I think I’m removing most of the object refs but my memory still consistently grows.
Moreover is there another way to share memory between workers without using actors? I am trying to collect and use cache for the ray task.
File "/media/root/prophet/DevOps/bnlwe-da-p-80200-prophetball/prophetball/CalendarOptimizerv2/Optimizer/components.py", line 511, in evaluate_all chromosomes,evaluation,updates_pgScanCaseRate,cache = zip(*ray.get([result for result in results])) File "/root/anaconda3/envs/LarusTF/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper return func(*args, **kwargs) File "/root/anaconda3/envs/LarusTF/lib/python3.8/site-packages/ray/worker.py", line 1448, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError: ray::promos_in_chromosome() (pid=31734, ip=172.16.69.158) File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 535, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 1600, in ray._raylet.CoreWorker.store_task_outputs File "python/ray/_raylet.pyx", line 151, in ray._raylet.check_status ray.exceptions.ObjectStoreFullError: Failed to put object 581d4440283ca102ffffffffffffffffffffffff0100000001000000 in object store because it is full. Object size is 30187139 bytes. The local object store is full of objects that are still in scope and cannot be evicted. Tip: Use the `ray memory` command to list active objects in the cluster.
OS: Linux- Ubuntu 20.04
ray.put does the object get overwritten? Let’s say for example I have a line like
cache_id = ray.put(cache)
and that line gets called several times. Will the memory increase even if it’s the same object and same size each time? And even if it’s not the same exact object, will it be replaced?