How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case?

klausk55 · November 9, 2021, 2:14pm

Hello,

How can I deploy a trained Ray RLlib PPO policy/model in multi-agent-case and using an RNN-based policy?

I guess the first step is to load/restore the PPO Trainer (i.e. trainer.restore(checkpoint)).
Then there are the functions trainer.compute_single_action and trainer.compute_actions. The latter seems to compute actions for a batch of observations under one specific policy.

What I want is to compute a single action for one of the agents using its RNN-based policy.
Do I have to use trainer.compute_single_action and pass observation, RNN-state and policy ID to it?

amogkam · November 10, 2021, 1:02am

@gjoliver any ideas here?

mannyv · November 10, 2021, 1:17pm

Hi @klausk55,

Have a look at this documentation: RLlib Training APIs — Ray v1.8.0

In the multiagent case the obs should be a dictionary with the agent(s) you want to compute the actions for.

Also don’t forget you need to chain the output state of one call as the input state of the next call to compute actions for that same agent.

Here is an example from serve:
RLlib Tutorial — Ray v1.8.0

klausk55 · November 10, 2021, 3:59pm

Thanks @mannyv!

I guess in the multi-agent case where the obs is a MultiAgentDict, the invoking method should be compute_actionS since it accepts a dict as an obs.

Here you mean the internal state in case of an RNN-based policy, right? If so, what would you say is an approriate initial state for the first call to compute an action? Simply zero arrays?

Yes, that’s a great example for an online serving use case! You’ve already made me aware of this in a previous post. I appreciate your help, thanks!

mannyv · November 10, 2021, 4:33pm

@klausk55,

compute_actions should also accept a dictionary observation. You can use either.

The trainer has a get_initial_state method you can use.

klausk55 · November 10, 2021, 4:40pm

What do you think @mannyv? Could it look like this?

    state = {}
    done = False

    obs_dict = env.reset()

    while not done:
        action_dict = {}
        for agent_id, obs in obs_dict.items():
            if state.get(agent_id) is None:
                state[agent_id] = trainer.get_policy(
                    policy_id="policy_{}".format(agent_id)).get_initial_state()
            action_dict[agent_id], state[agent_id], _ = trainer.compute_single_action(
                observation=obs, state=state[agent_id], policy_id="policy_{}".format(agent_id))
        obs_dict, reward, done, info = env.step(action_dict)

Topic		Replies	Views
Policy mapping for computing actions in multi agent env RLlib	8	1254	January 2, 2022
ValueError: Must pass in RNN state batches for placeholders [<tf.Tensor 'default_policy/Placeholder:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'default_policy/Placeholder_1:0' shape=(?, 256) dtype=float32>], got [] RLlib	7	429	June 20, 2021
[RLlib] Make it easier to play trained policies RLlib	2	775	June 3, 2021
Score the trained policy by ray RLlib	2	316	June 25, 2021
Loading pre-trained single-agent policy weights for multi-agent training RLlib	2	900	June 11, 2021

How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case?

Here is an example from serve: RLlib Tutorial — Ray v1.8.0

Related topics

Here is an example from serve:
RLlib Tutorial — Ray v1.8.0