Understanding agent_timesteps_total

Archana_R · February 3, 2023, 11:52am

Hi ,

Below is a snapshot of my output

agent_timesteps_total: 4000
counters:
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_env_steps_sampled: 4000
num_env_steps_trained: 4000
custom_metrics: {}
date: 2023-02-03_12-46-22
done: false
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0

My Code:

from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print
config = DEFAULT_CONFIG.copy()

agent = PPOTrainer(config, env=“fss-v1”) #custom environment

for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()

My question:

Why does it show : episodes_total = 0 ?
Why would the episode reward be NAN
What is agent_timesteps_total = 4000 mean ?

I checked Horizon config ( it is None - I do not understand this either and should i change its value )
Urgently need your inputs please.

Thank you!

mannyv · February 3, 2023, 2:24pm

Thus means that in one call to train, which samples 4000 steps from your environment(s), your environment did not terminate. Return done=True. The episode count and the mean reward do not update until episodes terminate. Training, by which I mean updating the policy, will occur every time it collects 4000 new environment steps.

Archana_R · February 3, 2023, 2:56pm

So it looks like my episode is not terminating . How do i get it so ? If my actions are always non legal actions , the game does not end.

Topic		Replies	Views
Num_agent_steps_trained: 0 Configure Algorithm, Training, Evaluation, Scaling	2	239	May 4, 2024
RLLib steps being sampled and trained but episode count is zero and reward metrics are nan RLlib	1	52	April 3, 2025
Constant episode_reward_mean over training, even setting horizon parameter RLlib	3	33	December 5, 2024
How to change default agent_timesteps_total in rllib_trainer.train() RLlib	3	467	June 29, 2021
Solving Custom Gym Environment Termination Issue with tune.Tuner and Large Dataset RLlib	3	363	April 7, 2023

Understanding agent_timesteps_total

Related topics