How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
Hi, as said on this post " in a multi-agent RL configuration, the reported episode_reward_mean in json_object is the sum of the episode_reward_mean obtained by RL each agent."
Hi @Username1, custom metrics have been discusses here a couple of times. We usually refer to the very good example here.
In there you find a callback names on_episode_end(). This is the callback you want to need as the episode has ended and mean rewards could be properly computed. The episode object in the arguments contains all the data you look for. You have to loop over the agents in the rewards therein and then add your mean rewards for each agent to the episode’ s custom_metrics attribute as shown in this line. The hist_data attribute will create for you histogram and distributions in TensorBoard whereas the custom_metrics create scalars.
Where should I see the custom metrics? Are they supposed to be printed to the console after training? I can’t see them printed or displayed on TB. Should I set ‘verbose = 3’?
This is my code:
class MyCallback(Callback):
def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
policies: Dict[str, Policy], episode: MultiAgentEpisode,
**kwargs):
episode.custom_metrics['agents_lst'] = episode.agent_rewards.keys()
episode.custom_metrics['mean_return_per_agent'] = list(episode.agent_rewards.keys())
# Graphs of Hist over time.
episode.custom_metrics["return_hist"] = episode.hist_data["mean_return_per_agent"]
@Username1 , this won’t work, as custom_metrics["my_metric"] needs a scalar to work, so you need to create a custom metric for each of your agents. You then want to define a summarization for the rewards a single agent collected.
Take a look into the example - it shows you with a simple example, how to do it.
@Username1, you do not need to use the MultiAgentEpisode specifically, as the callback here is simply inherited and that one uses the Episode which is also the base class for MultiAgentEpisode. Don’t worry the episode that is passed in at runtime is a MultiAgentEpisode.
To your problem, have you followed the workflow in the example? Did you add the callback to your configs callbacks? Do you see it in TensorBoard?
Hello @Lars_Simon_Zehnder this is my training module. I can’t see the custom metrics on the console nor on Tensorboard.
I also don’t know how to get the rewards per agent. I guess episode.agent_rewards brings back a dictionary with the rewards per agent or how to get the info I want. Thanks
(PPO pid=4329) File "/opt/anaconda3/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 650, in __init__
(PPO pid=4329) self.callbacks.on_sub_environment_created(
(PPO pid=4329) AttributeError: 'MyCallback' object has no attribute 'on_sub_environment_created'
Hey @Username1 , thanks for raising this issue. Not sure what’s the issue exactly, but it seems like you are subclassing your custom MyCallback from an older DefaultCallbacks class? The current master one has this method here. As a hack, you might just want to add it as-is to your MyCallback class:
Oh, I see, you are subclassing from a Callback class (maybe tune callbacks?). Could you subclass from
the ray.rllib.algorithms.callbacks::DefaultCallbacks class?