Training steps for DQN

BenjaminNatali · April 18, 2024, 12:51pm

How do I set up the training configuration of a DQN algorithm in order to be able to control the number of training steps performed in one iteration of algo.train()?

VisionZUS29 · April 18, 2024, 3:12pm

As in, the number of steps per episode before your environment gets reset and another episode is performed?

BenjaminNatali · April 18, 2024, 3:29pm

No, as in the number of steps corresponding to one training iteration, I.e. to one call to algo.train()

VisionZUS29 · April 18, 2024, 4:01pm

You could try something like this

custom_checkpoint_dir = "./your_directory"

print_guide = ["episodes_this_iter", "episode_reward_mean", "episode_reward_max", "episode_reward_min",
               "policy_reward_mean", "done",
               "num_env_steps_sampled_this_iter", ]
algo = config.build()
totalenvsteps=0

print("BEGINNING TRAINING")
for i in range(1, 1000):
    result = algo.train()
    print(f"Iteration: {i}")
    for key in print_guide:
        print(f'{key}: {result[key]}')
    if i % 20 == 0:
        checkpoint_dir = algo.save(custom_checkpoint_dir).checkpoint.path
        print(f"Checkpoint saved in directory {checkpoint_dir}")
    totalenvsteps+=result["num_env_steps_sampled_this_iter"]
    print("Total Training Steps:", totalenvsteps)
    print("\n")

or in your case

totalenvsteps=0

print("BEGINNING TRAINING")
while True:
    result = algo.train()
    for key in print_guide:
        print(result[key])
        print(result)
    totalenvsteps+=result["num_env_steps_sampled_this_iter"]
    print("Total Training Steps:", totalenvsteps)
    if totalenvsteps>=4000:
        break

because I dont believe you could specify the exact number through the config if your env varies in number of timesteps per episode This link here is pretty much the question you are asking)

now if you use ray tune you could specify how many timesteps the training runs by something like this

stop = {
        "episode_reward_mean": args.stop_reward,
        "timesteps_total": args.stop_timesteps,
    }
    # Get the path to the desktop
    desktop_path = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')

    # Create a path to the RayResults folder on the desktop
    ray_results_path = os.path.join(desktop_path, 'RayResults')
    results = tune.Tuner(PPO, run_config=air.RunConfig(storage_path=ray_results_path, name=f"Ray_{date.today()}",stop=stop, verbose=1), param_space=config,).fit()

    if args.run_as_test:
        check_learning_achieved(results, args.stop_reward)

    ray.shutdown()

where the you can define total timesteps in the args

Topic		Replies	Views
DQN algorithm possible bugg Configure Algorithm, Training, Evaluation, Scaling	5	310	February 19, 2024
Recommended way to evaluate training results RLlib	0	3257	June 12, 2021
ERROR algorithm.py:2604 -- Error in training or evaluation attempt! Trying to recover Configure Algorithm, Training, Evaluation, Scaling	2	340	May 14, 2023
Training frequency in DQN rllib	0	167	February 12, 2024
DQN in RLlib not leading to the same results as Vanilla PyTorch Implementation Configure Algorithm, Training, Evaluation, Scaling	0	342	June 21, 2023

Training steps for DQN

Related topics