Hi ,
Below is a snapshot of my output
agent_timesteps_total: 4000
counters:
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_env_steps_sampled: 4000
num_env_steps_trained: 4000
custom_metrics: {}
date: 2023-02-03_12-46-22
done: false
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0
My Code:
from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print
config = DEFAULT_CONFIG.copy()
agent = PPOTrainer(config, env=“fss-v1”) #custom environment
for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()
My question:
- Why does it show : episodes_total = 0 ?
- Why would the episode reward be NAN
- What is agent_timesteps_total = 4000 mean ?
I checked Horizon config ( it is None - I do not understand this either and should i change its value )
Urgently need your inputs please.
Thank you!