Hi there, I’m wondering is there a parameter to control the training of rl agent after a certain steps in trainer.train()
and tune.run()
, like in keras-rl package nb_steps_warmup
,
see
a use case
internal details
Thanks in advance.
1 Like
Some algos have a learning_starts
parameter. Those that use a replay buffer. For on-policy algos, such a setting wouldn’t really make sense, since samples from timesteps less than learning_starts
would simply be discarded w/o any effect on anything.
1 Like
Ohh… Thanks! That could make some sense.Originally, I intend to let the agent explore more, and see more, coz recentlly, I discovered the PPO for my custom env (large action and state space) stuck in a local optimum, and rarely it can get out of it and obtain a better result. Maybe I should first try to incease the train_batch_size
other than the default configuration (just start to learn rl, and quite confused, lol)?