I use ddppo, experiment on three phae specify resource 1gpus & 10cpus per woker, upper to double resources and 4x resources.
env is CartPole-v1
but the training speed not faster.
the training figure is
I use ddppo, experiment on three phae specify resource 1gpus & 10cpus per woker, upper to double resources and 4x resources.
env is CartPole-v1
but the training speed not faster.
the training figure is
generally speaking with ddppo training speedup will not necessarily be linear.
But also I think cartpole is a bad example to explain the speedups that you can get with DDPO