I have a multi agent setup with two agents ( two policies) . I only trained one of the agents in the multi agent setup and was discarding the second agent’s action in my setup ( replacing it with my controller’s value ) and saved the checkpoint when achieved my desired average reward with training only that one agent.
Then I was planning to load the checkpoint and get the policy for the trained agent and set it fixed ( explore=False) for that agent and now let the other agent to be trained and the other agent learn a new policy but I get observation out of space with “nan” values from the beginning the first iteration . I don’t have a clue why this is happening . Anyone can suggest me what’s going on or give me a clue! That would be appreciated .