How to train multiple policies in one environment?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am modifying my code by referring to this example.
This example shows how to bring in pre-trained policies and train other policies.

However, my problem is to checkpoint only one policy that controls nine agents and then restore it, and train another policy that controls the one agent in 10-agents environment that 9 agent’s policy is fixed.
Since my source code is made to have multiple agents follow one policy,
config[‘multiagent’] ['policy_mapping_fn '] was like
" policy_mapping_fn = lambda x : original policy "
By the way, I wanted to make one of these agents follow a policy other than the original policy.


The code was written as shown in the example above.
And then, it’s execution result seemed that ‘one policy’ was assigned to ‘every agent’ except ‘agent0’. (one policy per one agent)
Can you tell me how to modify the code to have ‘multiple agents(exept ‘agent0’)’ refer to ‘only one policy’ ?
thank you. Have a nice day.

Hi @coco
I’m sorry but I don’t understand the problem. The policy mapping function you posted already maps only one agent to one policy while all other agents will be mapped to another policy.
How is that different from the following?

Can you tell me how to modify the code to have ‘multiple agents(exept ‘agent0’)’ refer to ‘only one policy’ ?

Also, you can post code here as “Preformatted Text” so you don’t have to make screenshots and others can properly copy your code.

Also, this is a duplicate of How can I train multiple 'trainer' in same environment?(or embed trained trainer in environment?) - #3 by coco to which you still have to answer.
Please don’t open a new thread if you have opened another one for the same problem. :slightly_smiling_face:

Sorry, I’ll be more careful when I open the thread. :sweat_smile: