Hi, I had a question regarding the results ( mainly the performance and efficiency ) of the various algorithms available. I’m only a hobbyist programmer who has a great interest in machine learning but lacks formal education on the subject. Most of my experience is just from trial and error. When testing various algorithms on environments I’ve created, I’ve noticed that PPO seems to outperform by far when compared to IMPALA, A3C, etc. However, based on the benchmarks provided, I expected to have different results. Can someone shed some light on this? I believe that perhaps these algorithms are expected to be used with many more workers than I have available, or perhaps the hardware I have is a possible bottleneck.
For reference, I’m using an I9-9900KF and an RTX 3080, which leaves me with 15 workers. Also, I’m using WSL, which could be another bottleneck. Any help would be appreciated.
Hi!
Congrats for your system!
When you talk about performance, to you measure it per wall-clock time oder under consideration of experience-efficiency?
This is where many algorithms greatly differ, depending on how you compare them.
Have you had a look at the tuned examples?
Furthermore algorithm’s performances depend on the environment.
Have a look on how A3C and IMPALA (your examples) compare.
What did you expect to be different? There is no algorithm that beats all others algorithms on all environments on all benchmarks.
To try and find out if performance of an algorithms is different, or if it’s just your hardware, try and compare them at the same number of digested samples. If they are similar in the endgame, then one algorithm is just more resource inefficient than the other.
That being sad, I have also had the experience that PPO outperformed IMPALA on a similar setup. IMPALA seems to really shine when you can provide it with lots of workers.
PPO 8 iter 24.99s 32000ts
IMPALA (2 workers) 14 iter 137.01s 352550ts
A3C 27 iter 129.92s 115920ts
APEX (5 workers) 13 iter 62.51s 141750ts
DQN 17 iter 41.74s 17000ts
SAC 49 iter 57.02s 11100ts
To me, it just seems that no matter what environment I throw at PPO, it appears to outperform all the other algorithms. But I suppose it depends on the environment, as you put it:
There is no algorithm that beats all others algorithms on all environments on all benchmarks.
So, how can I figure out which algorithms would best suit the environment I’m working with? Is there a general rule, or should I get a baseline with all of them and go from there?
Hi again! My usage of these algorithms is not exhaustive and and so I think I will keep quiet on your last question. @sven1977
But still: The data that you posted shows that SAC needs less timesteps. So SAC may show better sample efficiency on some environments. Number of iterations of the algorithm is also a poor measure of performance since it does not tell you anything about how long you need to optimize or how many resources you need.