# How to make a function which is to record the average action of each episode

I want to monitor what the average action for each episode is, but I tried my best and failed. I begged God for a advice. Greatful！I have another doubt. The reward range of the environment I wrote is -20 to 200, but the reward trained by rllib reaches 5000.

Hi @yz_x

Welcome to the forum. I am not sure what you are asking exactly in your first question. I think you are saying you have a continuous action space and you want to take the average of the actions produced across all steps of an episode. Is that correct?

Is your environment single agent or multi-agent? The mean_episode_reward is the average cumulative reward across all timesteps and agents in a sample batch of possibly multiple episodes.

1 Like

markdown
Thank you very much for your help!@mannyv The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy % aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada % WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa % amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa % GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa % icdaa0GaeyyeIuoaaaa!5AFC! average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]}$

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean_ episode_ The calculation formula of reward is
$$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x % aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb % GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa % aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG % imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa % amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq % GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda % caGGQaGaaGynaaaaaaa!686A! {\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}$$

:::::::::
Thank you very much for your help! The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy % aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada % WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa % amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa % GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa % icdaa0GaeyyeIuoaaaa!5AFC! average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]}$

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean* episode* The calculation formula of reward is

$$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x % aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb % GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa % aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG % imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa % amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq % GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda % caGGQaGaaGynaaaaaaa!686A! {\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}$$

Thank you very much for your help! The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy % aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada % WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa % amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa % GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa % icdaa0GaeyyeIuoaaaa!5AFC! average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]}$

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean* episode* The calculation formula of reward is

$$% MathType!MTEF!2!1!± % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr % pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs % 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai % aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x % aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb % GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa % aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG % imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa % amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq % GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda % caGGQaGaaGynaaaaaaa!686A! {\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}$$