you are quicker then i can refine my post!
I will provide more tomorrow if I can, I am 14 hours deep into ray today and about 100 in the past 2 weeks into ray and modin.
There is a chance quite a bit above zero that I am just overlooking something/mixing things up here with my multi processing here, multithreading there, attempting this and that and inbetween waiting for unreasonable amount of time because i work with large data and did not spend the time making myself a repro, etc etc.
For now, it seams ray.shutdown() does indeed free most of my memory if I call it at the right time, and also kills the dashboard, as I was expecting. But when I call ray.init in the same process again strange issues arise.
Before ray.shutdown basically all workers appear as idle. And even if all references to objecs are lost nothing is garbage collected. But maybe modin is incorrectly keeping some references that I have no control over.
So right now in the task, when its done, i call ray.shutdown() and kill the tread
when a new task comes in (probably immediatly) a new thread is opened where ray.init() is called.
but this leads to
2024-08-29T01:14:48.547596851Z 01:14:48 | 243 | ..intern_depend.flasker.src.task_flasker | [CRITICAL] | segmentor: An uncaught exception was raised: - An application is trying to access a Ray object whose owner is unknown(00ffffffffffffffffffffffffffffffffffffff0100000005e1f505). Please make sure that all Ray objects you are trying to access are part of the current Ray session. Note that object IDs generated randomly (ObjectID.from_random()) or out-of-band (ObjectID.from_binary(...)) cannot be passed as a task argument because Ray does not know which task created them. If this was not how your object ID was generated, please file an issue at https://github.com/ray-project/ray/issues/
Whch I dont understand…
The object is definitely a new one as it is loaded from .parquet and is a different .parquet then in the task before.
Maybe because of some race condition it creates the object in the previous ray session? then destroys it and…well i dont know.
Actually maybe its some function (read_parquet in this case)?
Anyway it sure does not like it when i shutdown and reinit in the same process as it seams.
sadly i cant do multiprocessing because cuda doesnt like it. So what can i do?
I wish I could just stick with the same session without shutting it down at all, and just free the memory somehow. Nothing from the previous stuff is needed anymore.
But i havent found a way to do that
Anyway, I need sleep. I will provide more tomorrow
Thank you
Ps: can it be that modin somehow … registers functions with ray remote and when the ray instance changes it does not re registers them?
It kinda makes sense to me now. Althout it sucks…