@Blubberblub Thanks for your patience and detailed help. I finally solve this problem by changing the method of environment registration process.
However, there is another question:
I want to apply a trained policy obtained from a single agent scenario to a multi-agent scenario, and every agent should use this same trained policy. Could you please give some tips to implement this function in rllib, thank you!