Even though I read the Tune FAQ article Why are all my trials returning “1” iteration?, I am concerned that my implementation is working. Indeed I face the topic of all 5 trials returning 1 iteration.
How can I control that algorithm training has been properly executed? Should I look at the num_sgd_iter
?
tuner = tune.Tuner(
"PPO",
param_space=config,
run_config=RunConfig(
stop=stopping_criteria,
checkpoint_config=CheckpointConfig(
checkpoint_score_attribute="episode_reward_mean",
checkpoint_score_order="max",
checkpoint_frequency=2,
),
),
tune_config=tune.TuneConfig(
metric="episode_reward_mean",
mode="max",
num_samples=5,
reuse_actors=False,
max_concurrent_trials=3
),