I’m training some agents (DQN, PPO) with tune.run and I haven’t been able to find documentation about how to define the gamma (discount factor), learning rate and critic network of an agent while running a train with run.tune.
Thanks in advance,
Amit & Guy.
Common params
DQN
PPO
Learning rate can be tweak under the optimiser
parameters or directly using lr
.
Gamma under gamma
.
Networks under model
.
Hey,
I want to be ensured about the way I should write it:
config = {
“env_config”: {},
"lr": 0.00001,
"gamma": 0.95,
“stop”: {“training_iteration”: 100},
“res_path”: f"res/res_{agent_name}/",
“framework”: “torch”,
“seed”: 123,
“evaluation_interval”: 2,
“evaluation_num_episodes”: 10,
“exploration_config”: {
“type”: “EpsilonGreedy”,
“epsilon_schedule”: {
“type”: “ExponentialSchedule”,
“initial_p”: 1,
“schedule_timesteps”: 500,
“decay_rate”: 0.95,
},
},
“model”: {
“custom_model”: “new_models”,
“custom_model_config”: {
“hidden_size”: 10,
"Networks": 0.000005,
},
},
}
Thanks,
Amit.