Understanding agent_timesteps_total

Hi ,

Below is a snapshot of my output

agent_timesteps_total: 4000
counters:
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_env_steps_sampled: 4000
num_env_steps_trained: 4000
custom_metrics: {}
date: 2023-02-03_12-46-22
done: false
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0

My Code:

from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print
config = DEFAULT_CONFIG.copy()

agent = PPOTrainer(config, env=“fss-v1”) #custom environment

for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()

My question:

  1. Why does it show : episodes_total = 0 ?
  2. Why would the episode reward be NAN
  3. What is agent_timesteps_total = 4000 mean ?

I checked Horizon config ( it is None - I do not understand this either and should i change its value )
Urgently need your inputs please.

Thank you!

Thus means that in one call to train, which samples 4000 steps from your environment(s), your environment did not terminate. Return done=True. The episode count and the mean reward do not update until episodes terminate. Training, by which I mean updating the policy, will occur every time it collects 4000 new environment steps.

So it looks like my episode is not terminating . How do i get it so ? If my actions are always non legal actions , the game does not end.