jgonik
June 29, 2021, 11:53pm
1
Hi! I’m trying to compute custom metrics based on values in the info dictionary returned on every step of my custom MultiAgentEnv
. The info dictionary that I’m returning looks something like this:
info_dict = {
"0": {"reward": 500},
"1": {"reward": 400},
...
}
Within the dictionary, “0”, “1”, etc. are my agent IDs.
In my on_episode_step
callback, I’m trying to retrieve the info dictionary as follows:
for i in range(4):
agent_id = str(i)
info_dict = episode.last_info_for(agent_id)
if info_dict:
rewards_dict[agent_id] = rewards_dict.get(agent_id, 0) + info_dict["reward"]
However, info_dict
always ends up being empty, and I’m not sure how to go about debugging this. Any help would be greatly appreciated!
mannyv
June 30, 2021, 3:24am
2
Hi @jgonik ,
Which version of ray are you using?
There is a recent github issue similar to this:
opened 08:11PM - 25 Jun 21 UTC
P2
bug
### What is the problem?
With a MultiAgentEnv and for a on_episode_end custom c… allback (at least, I have not tested other callbacks or Envs) I am expecting to see the last info dictionary for each agent but I am seeing only empty dictionaries for each agent.
I think I have tracked down the issue to [this line](https://github.com/ray-project/ray/blob/9b17c35bee6a28b4b09ede36fdb9fdced3929f55/rllib/env/base_env.py#L473) which I think should be:
`infos = self.last_infos`
but I am not 100% sure of that.
*Ray version and other system information (Python version, TensorFlow version, OS):*
Python 3.8.3, Windows, TF 2.5, Numpy 1.19.5
### Reproduction (REQUIRED)
Simple reproducing script where I edited the BasicMultiAgent from examples to add content to the info dictionaries and a statement to print what we expect to see in the on_episode_end callback for validation.
```python
import gym
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from ray.rllib.examples.env.mock_env import MockEnv
import ray
from ray import tune
from ray.tune.registry import register_env
from typing import Dict
from ray.rllib.env import BaseEnv
from ray.rllib.policy import Policy
from ray.rllib.evaluation import MultiAgentEpisode, RolloutWorker
from ray.rllib.agents.callbacks import DefaultCallbacks
class MyCallback(DefaultCallbacks):
def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
policies: Dict[str, Policy], episode: MultiAgentEpisode,
**kwargs):
print("INSIDE CALLBACK: ", episode._agent_to_last_info)
class MyBasic(MultiAgentEnv):
def __init__(self, num):
self.agents = [MockEnv(25) for _ in range(num)]
self.dones = set()
self.observation_space = gym.spaces.Discrete(2)
self.action_space = gym.spaces.Discrete(2)
self.resetted = False
def reset(self):
self.resetted = True
self.dones = set()
return {i: a.reset() for i, a in enumerate(self.agents)}
def step(self, action_dict):
obs, rew, done, info = {}, {}, {}, {}
for i, action in action_dict.items():
obs[i], rew[i], done[i], info[i] = self.agents[i].step(action)
info[i] = {"MY TESTER" : 1.0}
if done[i]:
self.dones.add(i)
done["__all__"] = len(self.dones) == len(self.agents)
if done["__all__"]:
print("INSIDE ENV: ", info)
return obs, rew, done, info
register_env('basic', lambda env_config: MyBasic(4))
config = {
"env": 'basic',
'callbacks': MyCallback
}
ray.init(address='auto', _redis_password='5241590000000000')
results = tune.run("PPO", config=config, stop = {"training_iteration" : 2})
```
- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).
jgonik
June 30, 2021, 5:19pm
3
I’m also using the nightly version, so that Github issue addresses the problem. Thanks!