What is algorithm implemented by the A3C agent?

The A3C paper presented multi-threaded asynchronous versions of four algorithms, namely, one-step Sarsa, one-step Q-learning, n-step Q-learning, and advantage actor-critic.

My question is: when I train an A3C agent (e.g., by calling A3CTrainer.train()), what is the algorithm that is used by the A3C agent?

The first three you list were implemented as comparisons for the main method they were introducing which is the Asynchronous Advantage Actor-Critic (A3C)


There is a nice intro to rl guide that openai has put together. Keep in mind that it is pretty myopically focused on the algorithms that were invented there.


There is some evidence that A3C is actually not that efficient compared to other approaches. The common guidance I usually see is that you are better off either using A2C for synchronous training or IMPALA for asynchronous training with large numbers of workers.

1 Like

Got it. Thanks very much.