Mean reward per agent in MARL

Username1 · January 11, 2023, 9:26am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi, as said on this post " in a multi-agent RL configuration, the reported episode_reward_mean in json_object is the sum of the episode_reward_mean obtained by RL each agent."

How can I report the reward per agent?
How can I see it on Tensorboard as well?

Thanks!

Lars_Simon_Zehnder · January 11, 2023, 12:50pm

Hi @Username1, custom metrics have been discusses here a couple of times. We usually refer to the very good example here.

In there you find a callback names on_episode_end(). This is the callback you want to need as the episode has ended and mean rewards could be properly computed. The episode object in the arguments contains all the data you look for. You have to loop over the agents in the rewards therein and then add your mean rewards for each agent to the episode’ s custom_metrics attribute as shown in this line. The hist_data attribute will create for you histogram and distributions in TensorBoard whereas the custom_metrics create scalars.

Username1 · January 11, 2023, 4:16pm

Thank you very much @Lars_Simon_Zehnder for your reply.

Where should I see the custom metrics? Are they supposed to be printed to the console after training? I can’t see them printed or displayed on TB. Should I set ‘verbose = 3’?

This is my code:

class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: MultiAgentEpisode,
                       **kwargs):


            episode.custom_metrics['agents_lst']    = episode.agent_rewards.keys() 
            episode.custom_metrics['mean_return_per_agent']    = list(episode.agent_rewards.keys()) 

            # Graphs of Hist over time.
            episode.custom_metrics["return_hist"] = episode.hist_data["mean_return_per_agent"]

Lars_Simon_Zehnder · January 11, 2023, 4:29pm

@Username1 , this won’t work, as custom_metrics["my_metric"] needs a scalar to work, so you need to create a custom metric for each of your agents. You then want to define a summarization for the rewards a single agent collected.

Take a look into the example - it shows you with a simple example, how to do it.

Username1 · January 11, 2023, 4:51pm

Thank you very much @Lars_Simon_Zehnder

First, I am not sure, but I guess I should change the example from episode: Episode to episode: MultiAgentEpisode since my env is multi-agent.

Then, I’ve tried something very simple, but I can’t find the metrics printed out in the console:


class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: Episode,
                       **kwargs):
            episode.custom_metrics['agents_lst']    = 1 
            episode.custom_metrics['mean_return_per_agent']    = 2

Do you have any multi-agent example? or what I am missing to make it work? Thanks!

Lars_Simon_Zehnder · January 11, 2023, 5:49pm

@Username1, you do not need to use the MultiAgentEpisode specifically, as the callback here is simply inherited and that one uses the Episode which is also the base class for MultiAgentEpisode. Don’t worry the episode that is passed in at runtime is a MultiAgentEpisode.

To your problem, have you followed the workflow in the example? Did you add the callback to your configs callbacks? Do you see it in TensorBoard?

Username1 · January 12, 2023, 9:30am

Hello @Lars_Simon_Zehnder this is my training module. I can’t see the custom metrics on the console nor on Tensorboard.

I also don’t know how to get the rewards per agent. I guess episode.agent_rewards brings back a dictionary with the rewards per agent or how to get the info I want. Thanks

class MyCallback(Callback):
         def on_episode_end(self, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: Episode,
                       **kwargs):
            episode.custom_metrics['agents_lst']    = 1 
            episode.custom_metrics['mean_return_per_agent']    = 2

def setup_and_train():
  # config dict.. etc

   train_steps = 1
   experiment_name = 'my_env'

   tuner = tune.Tuner("PPO", param_space=config,
                              run_config=air.RunConfig(
                                        name =  experiment_name,
                                        stop={"timesteps_total": train_steps},
                                        checkpoint_config=air.CheckpointConfig(checkpoint_frequency=50, checkpoint_at_end=True),
                                        callbacks= [MyCallback()] #here
                                )
                                  )
results = tuner.fit()

Lars_Simon_Zehnder · January 12, 2023, 12:45pm

@Username1 , you are almost there. The callbacks in this case are RLlib callbacks and not Tune callbacks. So you have to add them to your config:

config.callbacks(MyCallback)

tuner = tune.Tune("PPO", ....)

Username1 · January 12, 2023, 1:16pm

Thank you very much @Lars_Simon_Zehnder for your time. So the entry on the tune dictionary callbacks= [MyCallback()] has to be removed right?

Now, when I add config.callbacks(MyCallback) to the PPO config like this:

  #RLLIB Configs
    N_CPUS = 4
    learning_rate = 1e-3
    config = PPOConfig()\
    .training(lr=learning_rate,num_sgd_iter=10, train_batch_size = 4000)\
    .framework("torch")\
    .rollouts(num_rollout_workers=1, observation_filter="MeanStdFilter")\
    .resources(num_gpus=0,num_cpus_per_worker=1)\
    .evaluation(evaluation_interval=100,evaluation_duration = 5, evaluation_duration_unit='episodes',
                evaluation_config= {"explore": False})\
    .environment(env = env_name, env_config={
                                     "num_workers": N_CPUS - 1,
                                     "disable_env_checking":True}                  )

    #RLLIB callbacks
    config.callbacks(MyCallback)

I get the following error:

(PPO pid=4329)   File "/opt/anaconda3/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 650, in __init__
(PPO pid=4329)     self.callbacks.on_sub_environment_created(
(PPO pid=4329) AttributeError: 'MyCallback' object has no attribute 'on_sub_environment_created'

sven1977 · January 12, 2023, 3:23pm

Hey @Username1 , thanks for raising this issue. Not sure what’s the issue exactly, but it seems like you are subclassing your custom MyCallback from an older DefaultCallbacks class? The current master one has this method here. As a hack, you might just want to add it as-is to your MyCallback class:

    @OverrideToImplementCustomLogic
    def on_sub_environment_created(
        self,
        *,
        worker: "RolloutWorker",
        sub_environment: EnvType,
        env_context: EnvContext,
        env_index: Optional[int] = None,
        **kwargs,
    ) -> None:
        pass

sven1977 · January 12, 2023, 3:24pm

Oh, I see, you are subclassing from a Callback class (maybe tune callbacks?). Could you subclass from
the ray.rllib.algorithms.callbacks::DefaultCallbacks class?

Username1 · January 12, 2023, 4:22pm

Thank you very much @sven1977 for your time and your response.

Thanks!

Topic		Replies	Views
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	247	June 24, 2024
How to obtain single episode reward? RLlib	6	1440	March 19, 2024
Any other metric other than "episode_reward_mean" Configure Algorithm, Training, Evaluation, Scaling	3	59	October 16, 2024
Learning curves Ray Tune	5	787	March 15, 2022
Add episode reward variance into matrix and tensorboard RLlib	4	535	February 15, 2022

Mean reward per agent in MARL

Related topics