Any examples of multi-agent with action maksing inference?

jimdor · April 25, 2025, 8:57am

okay, somehow I’ve managed this myself. The second approach should look like this

for step in range(1000):
    actions = {}

    for agent_id, agent_obs in obs.items():

        obs = agent_obs['observations']
        agent_action_mask = agent_obs['action_mask']

        obs_tensor = torch.tensor(obs, dtype=torch.float32).unsqueeze(0)
        action_mask_tensor = torch.tensor(agent_action_mask, dtype=torch.float32)

        with torch.no_grad():
            action_out = rl_module.forward_inference({"obs": {'observations': obs_tensor,
                                                              'action_mask': action_mask_tensor}})

        logits = action_out["action_dist_inputs"]

        dist = Categorical(logits=logits)
        action = dist.sample().item()
        actions[agent_id] = action

    obs, rewards, terminateds, truncateds, infos = env.step(actions)
    all_rewards.append(rewards)
    print(f"[STEP {step}] Rewards: {rewards}, Action: {actions}")
    sum_rewards += rewards['gruz_1']

    if terminateds.get("__all__", True) or truncateds.get("__all__", True):
        # print(all_rewards)
        print(f'Sum_rewords={sum_rewards}')
        break

and it really works for me

PS still not sure if it works correctly. The “not crashing with errors” not always means that everything is perfect tbh

Topic		Replies	Views
Simple multi agent setup with action masking problems RLlib	1	272	June 3, 2025
Inference multi agent not working RLlib	6	776	January 13, 2022
PPO Masking with MultiAgentEnv	5	39	March 28, 2025
Action masking redux RLlib	7	139	March 5, 2025
Do multi-agent environments need to specify an "action_space"? Configure Algorithm, Training, Evaluation, Scaling	11	121	April 7, 2025

Any examples of multi-agent with action maksing inference?

Related topics