Tf2 slower 6-8 times than pytorch

looks like TF1 is much faster, about 50% faster than torch, RAM usage is way better than TF2 (about 4 times lower) and I also see an increase of GPU usage from 35% with torch to 60% tf1, maybe because of TF internal parallelization?

I prefer to use torch where possible because it is written in C++ instead of java which does not waste RAM, CPU and GPU resources and torch is much faster overall, but with torch it crashes after a few millions of steps, I have posted this problem here RLlib crashes with more workers and envs - #2 by christy