How to make a function which is to record the average action of each episode

I want to monitor what the average action for each episode is, but I tried my best and failed. I begged God for a advice. Greatful!I have another doubt. The reward range of the environment I wrote is -20 to 200, but the reward trained by rllib reaches 5000.

Hi @yz_x

Welcome to the forum. I am not sure what you are asking exactly in your first question. I think you are saying you have a continuous action space and you want to take the average of the actions produced across all steps of an episode. Is that correct?

Is your environment single agent or multi-agent? The mean_episode_reward is the average cumulative reward across all timesteps and agents in a sample batch of possibly multiple episodes.

1 Like

markdown
Thank you very much for your help!@mannyv The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy
% aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada
% WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa
% amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa
% GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa
% icdaa0GaeyyeIuoaaaa!5AFC!
average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]} $

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean_ episode_ The calculation formula of reward is
$$
% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x
% aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb
% GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa
% aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG
% imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa
% amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq
% GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda
% caGGQaGaaGynaaaaaaa!686A!
{\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}
$$

:::::::::
Thank you very much for your help! The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy
% aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada
% WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa
% amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa
% GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa
% icdaa0GaeyyeIuoaaaa!5AFC!
average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]} $

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean* episode* The calculation formula of reward is

$$
% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x
% aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb
% GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa
% aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG
% imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa
% amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq
% GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda
% caGGQaGaaGynaaaaaaa!686A!
{\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}
$$

Thank you very much for your help! The first problem I’m trying to solve with the callback function. (The meaning of the first question is: I want to find the average action of an episode. For example, an episode has 100 steps, and each step has an action (the action is between - 1 and + 1). I want to get the average value of these 100 actions$% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaWGHbGaamODaiaadwgacaWGYbGaamyy
% aiaadEgacaWGLbGaaeiiaiaadggacqGH9aqpdaaeWbqaaiaadggada
% WgaaWcbaGaamyAaaqabaGccaqGGaGaaeiiaiaadggadaWgaaWcbaGa
% amyAaaqabaGccqGHiiIZcaGGBbGaeyOeI0IaaGymaiaacYcacaaIXa
% GaaiyxaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaGimaiaa
% icdaa0GaeyyeIuoaaaa!5AFC!
average{\text{ }}a = \sum\limits_{i = 1}^{100} {{a_i}{\text{ }}{a_i} \in [ - 1,1]} $

The second question, you mean: for example, I have 10 agents. Suppose each agent has 5 episodes, and the average reward of each episode is between 1 and 100, then mean* episode* The calculation formula of reward is

$$
% MathType!MTEF!2!1!±
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr
% pepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9vqaqpepm0xbba9pwe9Q8fs
% 0-yqaqpepae9pg0FirpepeKkFr0xfr-xfr-xb9adbaqaaeGaciGaai
% aabeqaamaabaabauaakeaacaqGTbGaaeyzaiaabggacaqGUbGaae4x
% aiaabwgacaqGWbGaaeyAaiaabohacaqGVbGaaeizaiaabwgacaqGFb
% GaaeOCaiaabwgacaqG3bGaaeyyaiaabkhacaWGKbGaeyypa0ZaaSaa
% aeaadaaeWbqaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaaIXaGaaG
% imaaqdcqGHris5aOWaaabCaeaacaWGYbGaamyzaiaadEhacaWGHbGa
% amOCaiaadsgadaWgaaWcbaGaamyAaiaadQgaaeqaaaqaaiaadQgacq
% GH9aqpcaaIXaaabaGaaGynaaqdcqGHris5aaGcbaGaaGymaiaaicda
% caGGQaGaaGynaaaaaaa!686A!
{\text{mean_episode_rewar}}d = \frac{{\sum\limits_{i = 1}^{10}\sum\limits_{j = 1}^5 {rewar{d_{ij}}} }}{{10*5}}
$$