Non acting agents in APPO

Sertingolix · January 20, 2022, 8:00am

Hi there,

When checking that everything works fine with my environment I noticed, that the training batch in APPO (torch) also contains observations for non acting agents (all zeros) and therefore also computes actions/ vf_pred for them.

Now my question:

Should I make sure that no optimization is done with those samples? E.g. by detaching gradients for those actions.

Is this even intended behavior. Do the reported stats take those “fake” trajectories into account? I totally get that for implementation reasons this is easier, because the shapes are always the same.

I thought I ask this before going through the APPO code.

Any help appreciated.
Thank you

sven1977 · January 26, 2022, 3:15pm

Hey @Sertingolix , not sure, but these zeros you are seeing could simply be the initial dummy batch that gets passed through your loss function by RLlib.
can you confirm that you are only seeing those for the first loss-pass in each of your policies. Note that for each remote worker + the local worker, you should see this once, as each of them has a copy of the policy.

Sertingolix · January 26, 2022, 5:19pm

This also prevails after the initial dummy batches are processed. I actually get correct/real experience from the environment proportional to the number of acting agents and zero samples otherwise. Also I do not have replay that could lead to samples staying longer in the training loop.

Although i think this should not matter but i use a repeated space in the observation. Just thinking of it now.
I didn’t go through the code yet, because I was able to reduce the environment to an equivalent one with all acting agents and it trains successfully.

Topic		Replies	Views
How to compute actions with RLlib and Tune after training RLlib	3	410	September 21, 2024
Multi-agent APPO with variable agent numbers and horizon RLlib	0	294	April 4, 2022
Scripted Agent Support RLlib	2	298	June 10, 2021
Vectorized multi-agent setup RLlib	3	411	February 12, 2021
Reproducing results from stablebaselines 3 RLlib	2	649	August 6, 2021

Non acting agents in APPO

Related topics