This is more of an RL question, but I thought I might be able to get some help here.
I have a manual implementation of PPO and a custom environment. I previously tuned hyperparameters of PPO for my problem using random search and obtained very good performance. In particular, I found the optimal number of parallel actors to be 4.
Now I am trying to speed up my algorithm using Ray and Ray Clusters. As I will have access to more cores, can I expect the same kind of performance if I use a different (higher) number of parallel actors, keeping all the other hyperparameters the same?
I am aware that I could use RLlib if I want faster performance, but I thought it would be a good exercise to speed up my PPO using Ray and Ray Clusters.