Understanding the basics of PPOTrainer

Archana_R · February 6, 2023, 1:22pm

If the below code is
agent = PPOTrainer(config, env=“CartPole-v1”) -----------------------------------(1)

for _ in range(1):
result = agent.train() -------------------------------------------------------------(2)

Does (1) mean, that it has collected the training data of default 4000 batch size ( called the reset, step functions in the environment )

Does (2) mean, that with the data already collected from (1) , the policy is getting changed or trained based on loss calculated ?

I see so many versions of implementation of the above and getting confused.
example:

algo = (
PPOConfig()
.rollouts(num_rollout_workers=1)
.resources(num_gpus=0)
.environment(env=“CartPole-v1”)
.build()
)

for i in range(10):
result = algo.train()

isnt this the same ?

mannyv · February 6, 2023, 1:43pm

Hi @Archana_R

(1) builds the training object in this case the PPO RL algorithm. No interaction with the environment occurs here other than to build them (and perhaps call reset, I am not sure if this happens here or in 2).

(2) for each call to train it will:
(a) sample 4000 new steps from the environment using the current version of the policy
(b) train on-policy with the trajectories that were sampled in (a)

Archana_R · February 6, 2023, 2:10pm

Thanks this helps . I have 1 more basic question.
1 step implies → 1 iteration in visiting step function with a chosen action ?

Also, what do you think about the last statement on the different versions of implementation. ?

Topic		Replies	Views
PPO only run several steps in one episode RLlib	1	54	September 10, 2024
Num_env & agent_steps_trained 0 even though steps sampled? RLlib	7	868	April 25, 2024
Step by step way to interact with an environment and update an agent Configure Algorithm, Training, Evaluation, Scaling	1	353	May 23, 2023
Get the number of training steps when loading a trained agent RLlib	2	601	March 16, 2021
Num_agent_steps_trained: 0 Configure Algorithm, Training, Evaluation, Scaling	2	246	May 4, 2024

Understanding the basics of PPOTrainer

Related topics