Hi all,
I am new to RAY/RLLib. I need help with a custom environment I have created. Here is a snapshot of the log when I train PPO on a custom environment I have built.
The counters num_agent_steps_trained and num_env_steps_trained is 0. Also “done = False” even though I set don = True when 500 steps are sampled.
What are possible things I should be checking or possible errors I should be looking for to correct this ?
I notice that there is no learning happening, the mean reward per episode does not seem to happen. Also there are two counters num_agent_steps_trained and num_env_steps_trained with the same name. What is the difference between the two ?
I have set the episode length to 500, by setting done=True when 500 steps are sampled.
Below is part of the log. Any help will be greatly appreciated.
Thanks,
Tarka
counters:
num_agent_steps_sampled: 43000
num_agent_steps_trained: 0
num_env_steps_sampled: 43000
num_env_steps_trained: 0
custom_metrics: {}
date: 2024-04-04_08-20-36
done: false
episode_len_mean: 500.0
episode_media: {}
episode_reward_max: -625.7630598837428
episode_reward_mean: -1059.0733917539524
episode_reward_min: -1620.7717931357777
episodes_this_iter: 2
episodes_total: 86
hostname: E-CND2273HMV
info:
learner:
all:
num_agent_steps_trained: 128.0
num_env_steps_trained: 1000.0
total_loss: 2488.746065732266