How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I’m running a MARL experiment with RLlib. My environment is a subclass of
MultiAgentEnv. I’ve got six agents/policies that I’m training with PPO.
I want some of the agents to have a learning rate
x, while other agents will have a learning rate
y. Is that possible?
Alternatively, is it possible to have some agents not learn at all, just continue playing with their existing PPO policy, while other agents play and learn?
Thanks for your help,