[Core] many GC requests from the node manager

I am running some RLlib experiments on a distributed cluster ~20 machines. I am using the nightly.

I have some trainers that never manage to start rolling out (I do some data prefetch/loading during env init).

I was looking through session_latest.logs.raylet.out and noticed
~ every 500ms the node_manager was issuing a GC request to some of my workers.
I am seeing log lines ~ every half second that say:
sending local GC request to N workers. it is due to local memory pressure on the local worker.

If I check htop on my machines and the dashboard I see that my memory usage < 50% everwhere.

A) Is this normal?
B) any recommendations on how to debug further?

You’re using the nightly wheels right, @raoul-khour-ts?

cc @sangcho @ericl

Yeah, I am using the nightly from yesterday.

Oh, we don’t actually trigger GC although that log was called. We always throttle the number of global gc (I think once per minute at maximum). so it is a spam log. We will remove that log from https://github.com/ray-project/ray/pull/12773/files

That makes sense. But I am also curious why it thinks there is pressure it should not be doing any GC my memory usage should be relatively low.

I might be wrong but it seems like this is causing my dataloader to load forever :worried:

Are you seeing messages like this every 500ms?

Sending Python GC request to " << all_workers.size()
                   << " workers. It is due to memory pressure on the local node.";	
                   << " local workers to clean up Python cyclic references.";

If so, that’s actually pretty weird.

It is actually saying:
"noce_manager.cc:530: Sending local GC request to n workers. It is due to memory pressure on the local node.

and it might be closer to 750ms

Hmm actually, I cannot see those log messages from the latest master? Are you really using the nightly? Can you check

import ray

And lmk what’s the commit of ray?

:man_facepalming: I have not been using the nightly…

I was still using pip install -U the 1.1.0.dev wheels

So yeah a bit out dated Ill try the new nightly to see if this is still happening there.

Thanks @sangcho

Yeah, try 1.2.0.dev0 :slight_smile:

it seems to work there thanks @rliaw