Collating custom results from different workers

Hi
I am using PPO with num_workers=4. During training, for some specific state-action combinations, an event happens. I have a counter to calculate the total number of times that event happens during training, This setup will give me 4 counter values corresponding to 4 RL workers/ cores. Is there a way in rlllib to combine these 4 counter values in the end of training?

There might be another way to go about it, but the first thing I thought of is to create a ray actor. See the global coordination section here: RLlib Training APIs — Ray v2.0.0.dev0. You can access it from anywhere, as i understand it.

Thanks @RickDW . For further clarification, let’s say training part of my code looks like following. Where and how do I fit the counter suggested here RLlib Training APIs — Ray v2.0.0.dev0 ?

config={‘num_workers’:4,‘horizon’: … }
trainer_ppo = ppo.PPOTrainer(config=config, env= … )
while True:
results = trainer_ppo.train()
if stopping condition satisfied:
break

You would create the counter before you enter the while loop

counter = Counter.options(name="global_counter").remote()

The code to increment the counter should be put wherever your counter logic is located in your implementation. I.e.

counter = ray.get_actor("global_counter")
counter.inc.remote(1)  # async call to increment the global count

Hope that helps :slight_smile:

1 Like

If you want to read more about remote actors you can find it in the ray core section of the docs by the way.

@RickDW As per your suggestion, I am looking into the documentation. It seems like Counter.options (…) instantiates an actor. That creates a worker specific to Counter. There are 4 other workers used for PPO training. Does that mean training workers will independently call the Counter worker to increment the counter?

Yes that sounds right to me. The Counter actor keeps track of the number of events and it handles the synchronization between multiple workers. If you’re familiar with multi-threaded programming, it basically makes sure the equivalent of a race condition doesn’t occur.

Thanks for that. I will verify if it works as expected