Mean reward per agent in MARL

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, as said on this post " in a multi-agent RL configuration, the reported episode_reward_mean in json_object is the sum of the episode_reward_mean obtained by RL each agent."

  1. How can I report the reward per agent?
  2. How can I see it on Tensorboard as well?

Thanks!

Hi @Username1, custom metrics have been discusses here a couple of times. We usually refer to the very good example here.

In there you find a callback names on_episode_end(). This is the callback you want to need as the episode has ended and mean rewards could be properly computed. The episode object in the arguments contains all the data you look for. You have to loop over the agents in the rewards therein and then add your mean rewards for each agent to the episode’ s custom_metrics attribute as shown in this line. The hist_data attribute will create for you histogram and distributions in TensorBoard whereas the custom_metrics create scalars.

Thank you very much @Lars_Simon_Zehnder for your reply.

Where should I see the custom metrics? Are they supposed to be printed to the console after training? I can’t see them printed or displayed on TB. Should I set ‘verbose = 3’?

This is my code:

class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: MultiAgentEpisode,
                       **kwargs):


            episode.custom_metrics['agents_lst']    = episode.agent_rewards.keys() 
            episode.custom_metrics['mean_return_per_agent']    = list(episode.agent_rewards.keys()) 

            # Graphs of Hist over time.
            episode.custom_metrics["return_hist"] = episode.hist_data["mean_return_per_agent"]

@Username1 , this won’t work, as custom_metrics["my_metric"] needs a scalar to work, so you need to create a custom metric for each of your agents. You then want to define a summarization for the rewards a single agent collected.

Take a look into the example - it shows you with a simple example, how to do it.

1 Like

Thank you very much @Lars_Simon_Zehnder

First, I am not sure, but I guess I should change the example from episode: Episode to episode: MultiAgentEpisode since my env is multi-agent.

Then, I’ve tried something very simple, but I can’t find the metrics printed out in the console:


class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: Episode,
                       **kwargs):
            episode.custom_metrics['agents_lst']    = 1 
            episode.custom_metrics['mean_return_per_agent']    = 2

Do you have any multi-agent example? or what I am missing to make it work? Thanks!

@Username1, you do not need to use the MultiAgentEpisode specifically, as the callback here is simply inherited and that one uses the Episode which is also the base class for MultiAgentEpisode. Don’t worry the episode that is passed in at runtime is a MultiAgentEpisode.

To your problem, have you followed the workflow in the example? Did you add the callback to your configs callbacks? Do you see it in TensorBoard?

Hello @Lars_Simon_Zehnder this is my training module. I can’t see the custom metrics on the console nor on Tensorboard.

I also don’t know how to get the rewards per agent. I guess episode.agent_rewards brings back a dictionary with the rewards per agent or how to get the info I want. Thanks

class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: Episode,
                       **kwargs):
            episode.custom_metrics['agents_lst']    = 1 
            episode.custom_metrics['mean_return_per_agent']    = 2
def setup_and_train():
  # config dict.. etc

   train_steps = 1
   experiment_name = 'my_env'

   tuner = tune.Tuner("PPO", param_space=config,
                              run_config=air.RunConfig(
                                        name =  experiment_name,
                                        stop={"timesteps_total": train_steps},
                                        checkpoint_config=air.CheckpointConfig(checkpoint_frequency=50, checkpoint_at_end=True),
                                        callbacks= [MyCallback()] #here
                                )
                                  )
results = tuner.fit()

@Username1 , you are almost there. The callbacks in this case are RLlib callbacks and not Tune callbacks. So you have to add them to your config:

config.callbacks(MyCallback)

tuner = tune.Tune("PPO", ....)
1 Like

Thank you very much @Lars_Simon_Zehnder for your time. So the entry on the tune dictionary callbacks= [MyCallback()] has to be removed right?

Now, when I add config.callbacks(MyCallback) to the PPO config like this:

  #RLLIB Configs
    N_CPUS = 4
    learning_rate = 1e-3
    config = PPOConfig()\
    .training(lr=learning_rate,num_sgd_iter=10, train_batch_size = 4000)\
    .framework("torch")\
    .rollouts(num_rollout_workers=1, observation_filter="MeanStdFilter")\
    .resources(num_gpus=0,num_cpus_per_worker=1)\
    .evaluation(evaluation_interval=100,evaluation_duration = 5, evaluation_duration_unit='episodes',
                evaluation_config= {"explore": False})\
    .environment(env = env_name, env_config={
                                     "num_workers": N_CPUS - 1,
                                     "disable_env_checking":True}                  )

    #RLLIB callbacks
    config.callbacks(MyCallback)

I get the following error:

(PPO pid=4329)   File "/opt/anaconda3/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 650, in __init__
(PPO pid=4329)     self.callbacks.on_sub_environment_created(
(PPO pid=4329) AttributeError: 'MyCallback' object has no attribute 'on_sub_environment_created'

Hey @Username1 , thanks for raising this issue. Not sure what’s the issue exactly, but it seems like you are subclassing your custom MyCallback from an older DefaultCallbacks class? The current master one has this method here. As a hack, you might just want to add it as-is to your MyCallback class:

    @OverrideToImplementCustomLogic
    def on_sub_environment_created(
        self,
        *,
        worker: "RolloutWorker",
        sub_environment: EnvType,
        env_context: EnvContext,
        env_index: Optional[int] = None,
        **kwargs,
    ) -> None:
        pass

Oh, I see, you are subclassing from a Callback class (maybe tune callbacks?). Could you subclass from
the ray.rllib.algorithms.callbacks::DefaultCallbacks class?

1 Like

Thank you very much @sven1977 for your time and your response.

Thanks!

1 Like