Collating custom results from different workers

Saurabh_Arora · August 11, 2021, 2:14pm

Hi
I am using PPO with num_workers=4. During training, for some specific state-action combinations, an event happens. I have a counter to calculate the total number of times that event happens during training, This setup will give me 4 counter values corresponding to 4 RL workers/ cores. Is there a way in rlllib to combine these 4 counter values in the end of training?

RickDW · August 11, 2021, 2:21pm

There might be another way to go about it, but the first thing I thought of is to create a ray actor. See the global coordination section here: RLlib Training APIs — Ray v2.0.0.dev0. You can access it from anywhere, as i understand it.

Saurabh_Arora · August 11, 2021, 2:30pm

Thanks @RickDW . For further clarification, let’s say training part of my code looks like following. Where and how do I fit the counter suggested here RLlib Training APIs — Ray v2.0.0.dev0 ?

config={‘num_workers’:4,‘horizon’: … }
trainer_ppo = ppo.PPOTrainer(config=config, env= … )
while True:
results = trainer_ppo.train()
if stopping condition satisfied:
break

RickDW · August 11, 2021, 2:39pm

You would create the counter before you enter the while loop

counter = Counter.options(name="global_counter").remote()

The code to increment the counter should be put wherever your counter logic is located in your implementation. I.e.

counter = ray.get_actor("global_counter")
counter.inc.remote(1)  # async call to increment the global count

Hope that helps

RickDW · August 11, 2021, 2:46pm

If you want to read more about remote actors you can find it in the ray core section of the docs by the way.

Saurabh_Arora · August 11, 2021, 3:08pm

@RickDW As per your suggestion, I am looking into the documentation. It seems like Counter.options (…) instantiates an actor. That creates a worker specific to Counter. There are 4 other workers used for PPO training. Does that mean training workers will independently call the Counter worker to increment the counter?

RickDW · August 11, 2021, 3:13pm

Yes that sounds right to me. The Counter actor keeps track of the number of events and it handles the synchronization between multiple workers. If you’re familiar with multi-threaded programming, it basically makes sure the equivalent of a race condition doesn’t occur.

Saurabh_Arora · August 11, 2021, 3:15pm

Thanks for that. I will verify if it works as expected

Topic		Replies	Views
RLlib: using evaluation workers on previously trained models RLlib	7	2081	December 8, 2022
Can not use two agents of PPO RLlib	1	284	March 16, 2022
How to run multiple trainers? RLlib	2	228	August 26, 2022
Num_env & agent_steps_trained 0 even though steps sampled? RLlib	5	493	September 28, 2023
How can i run two PPO in parallel or in sequence? RLlib	2	314	October 10, 2022

Collating custom results from different workers

Related Topics