Multiagent to batch learn and update a single policy

adai · December 1, 2020, 4:02am

RL problem
I have a single agent with custom reward function, custom actions, custom environment, custom observation, and uses the generic SAC policy with a neural network. The agent iterates through 1e6 episodes serially to update the SAC policy. This is time consuming.

Want to parallelize
To speed up learning, we have multiple agents (e.g., say 10 agents) stepping in parallel. At the end of each episode, we batch update the same single SAC policy. The updated SAC policy shall be used by all the agents in the next episode. This will reduce the learning time. Alternatively, any other ray methods to speed up the RL training problem stated above is appreciated.

I am new to reinforcement learning and also new to ray. Could you provide a simple step-by-step complete working ray code example with explanation to achieve the above idea? I have already scoured the entire Ray documentation website, but still am unsure on how to exactly achieve the above. Hence, any help specific to my problem is much appreciated.

Thank you.

Topic		Replies	Views
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	825	November 10, 2021
Decentralized multi agent reinforcement learning RLlib	4	121	November 2, 2024
Multi-Agent Policy Switching RLlib	0	172	November 30, 2023
Multi agent Policy, selector agent RLlib	0	218	May 9, 2023
Distributed multi-agent training RLlib	0	256	February 20, 2022

Multiagent to batch learn and update a single policy

Related topics