Local_policy_inference() fail when called with trained policy

henry_lei · March 3, 2025, 11:29pm

I trained a multiagent policy with the r2d2 algorithm on a custom action-masked environment and am trying to roll it out using local_policy_inference() to perform an inference step but am getting some errors that I’m having trouble making sense of:

Relevant rollout code:

## Run simulation
# Perform actions and grab data
timesteps = 40
for i in range(1, timesteps):
    ## Run Policy inference
    ## Connector version
    action_dict = {}
    print("obs", obs)
    for agent_id in obs:
        # add fake reward at first step?
        policy_outputs = local_policy_inference(policy=my_agents[agent_id],
                                                env_id="ActionMaskEnv",
                                                agent_id=agent_id,
                                                obs=obs[agent_id],
                                                explore=False,
                                                reward=0)
        action, state_out, info = policy_outputs[0]
        action_dict[agent_id] = action

    ## step environment
    obs, reward, terminated, truncated, info = env.step(action_dict)

Traceback:
Traceback (most recent call last):
File “C:\Users\henry.lei\Documents\Projects\Inspect-A-M4-RL\HL-Inspection-Env\Scripts\rollout.py”, line 479, in
output_dict = runSingleRollout(env_config_path, policy_path)
File “C:\Users\henry.lei\Documents\Projects\Inspect-A-M4-RL\HL-Inspection-Env\Scripts\rollout.py”, line 217, in runSingleRollout
policy_outputs = local_policy_inference(policy=my_agents[agent_id],
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\utils\policy.py”, line 254, in local_policy_inference
ac_outputs: List[AgentConnectorsOutput] = policy.agent_connectors(acd_list)
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\agent\pipeline.py”, line 41, in call
ret = c(ret)
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\connector.py”, line 265, in call
return [self.transform(d) for d in acd_list]
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\connector.py”, line 265, in
return [self.transform(d) for d in acd_list]
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\agent\view_requirement.py”, line 118, in transform
sample_batch = agent_collector.build_for_inference()
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\evaluation\collectors\agent_collector.py”, line 383, in build_for_inference
element_at_t = d[view_req.shift_arr + len(d) - 1]
TypeError: only integer scalar arrays can be converted to a scalar index

I dug into agent_collector.py a bit and it seems like some buffers are empty when they maybe shouldn’t be:

(print statements from within agent_collector module)
agent collector _cache_in_np
key agent_index
[]
agent_collector _to_float_np_array
data_col agent_index
d

but I’m not sure where agent_index is supposed to come from. Has anyone seen this before?

Ray details:
python 3.9.19
ray 2.7.1
R2D2 algorithm
custom recurrent network
custom action-masked environment

Thanks!

henry_lei · March 11, 2025, 6:45pm

===================================================
bump?

Lars_Simon_Zehnder · March 11, 2025, 7:02pm

Hi @henry_lei , its hard to tell from just looking at a part of the code and the error output. Can you provide a small reproducable example?

Topic		Replies	Views
Question related to inference in RLlib RLlib	5	814	May 13, 2021
How to pass argument to the policy compute action function when using local_policy_inference? Checkpointing, Restoring	4	435	June 1, 2023
Policy mapping for computing actions in multi agent env RLlib	8	1191	January 2, 2022
Potential bug in client server setup with policy mapping functions RLlib	3	359	September 1, 2022
Any examples of multi-agent with action maksing inference? Configure Algorithm, Training, Evaluation, Scaling	1	15	April 25, 2025

Local_policy_inference() fail when called with trained policy

Related topics