I trained a multiagent policy with the r2d2 algorithm on a custom action-masked environment and am trying to roll it out using local_policy_inference() to perform an inference step but am getting some errors that I’m having trouble making sense of:
Relevant rollout code:
## Run simulation
# Perform actions and grab data
timesteps = 40
for i in range(1, timesteps):
## Run Policy inference
## Connector version
action_dict = {}
print("obs", obs)
for agent_id in obs:
# add fake reward at first step?
policy_outputs = local_policy_inference(policy=my_agents[agent_id],
env_id="ActionMaskEnv",
agent_id=agent_id,
obs=obs[agent_id],
explore=False,
reward=0)
action, state_out, info = policy_outputs[0]
action_dict[agent_id] = action
## step environment
obs, reward, terminated, truncated, info = env.step(action_dict)
Traceback:
Traceback (most recent call last):
File “C:\Users\henry.lei\Documents\Projects\Inspect-A-M4-RL\HL-Inspection-Env\Scripts\rollout.py”, line 479, in
output_dict = runSingleRollout(env_config_path, policy_path)
File “C:\Users\henry.lei\Documents\Projects\Inspect-A-M4-RL\HL-Inspection-Env\Scripts\rollout.py”, line 217, in runSingleRollout
policy_outputs = local_policy_inference(policy=my_agents[agent_id],
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\utils\policy.py”, line 254, in local_policy_inference
ac_outputs: List[AgentConnectorsOutput] = policy.agent_connectors(acd_list)
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\agent\pipeline.py”, line 41, in call
ret = c(ret)
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\connector.py”, line 265, in call
return [self.transform(d) for d in acd_list]
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\connector.py”, line 265, in
return [self.transform(d) for d in acd_list]
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\connectors\agent\view_requirement.py”, line 118, in transform
sample_batch = agent_collector.build_for_inference()
File “C:\Users\henry.lei\Miniconda3\envs\m4rl-insp-ray-2.7.1\lib\site-packages\ray\rllib\evaluation\collectors\agent_collector.py”, line 383, in build_for_inference
element_at_t = d[view_req.shift_arr + len(d) - 1]
TypeError: only integer scalar arrays can be converted to a scalar index
I dug into agent_collector.py a bit and it seems like some buffers are empty when they maybe shouldn’t be:
(print statements from within agent_collector module)
agent collector _cache_in_np
key agent_index
[]
agent_collector _to_float_np_array
data_col agent_index
d
but I’m not sure where agent_index is supposed to come from. Has anyone seen this before?
Ray details:
python 3.9.19
ray 2.7.1
R2D2 algorithm
custom recurrent network
custom action-masked environment
Thanks!