Training steps for DQN

How do I set up the training configuration of a DQN algorithm in order to be able to control the number of training steps performed in one iteration of algo.train()?

As in, the number of steps per episode before your environment gets reset and another episode is performed?

No, as in the number of steps corresponding to one training iteration, I.e. to one call to algo.train()

You could try something like this

custom_checkpoint_dir = "./your_directory"

print_guide = ["episodes_this_iter", "episode_reward_mean", "episode_reward_max", "episode_reward_min",
               "policy_reward_mean", "done",
               "num_env_steps_sampled_this_iter", ]
algo = config.build()
totalenvsteps=0

print("BEGINNING TRAINING")
for i in range(1, 1000):
    result = algo.train()
    print(f"Iteration: {i}")
    for key in print_guide:
        print(f'{key}: {result[key]}')
    if i % 20 == 0:
        checkpoint_dir = algo.save(custom_checkpoint_dir).checkpoint.path
        print(f"Checkpoint saved in directory {checkpoint_dir}")
    totalenvsteps+=result["num_env_steps_sampled_this_iter"]
    print("Total Training Steps:", totalenvsteps)
    print("\n")

or in your case

totalenvsteps=0

print("BEGINNING TRAINING")
while True:
    result = algo.train()
    for key in print_guide:
        print(result[key])
        print(result)
    totalenvsteps+=result["num_env_steps_sampled_this_iter"]
    print("Total Training Steps:", totalenvsteps)
    if totalenvsteps>=4000:
        break

because I dont believe you could specify the exact number through the config if your env varies in number of timesteps per episode This link here is pretty much the question you are asking)

now if you use ray tune you could specify how many timesteps the training runs by something like this

stop = {
        "episode_reward_mean": args.stop_reward,
        "timesteps_total": args.stop_timesteps,
    }
    # Get the path to the desktop
    desktop_path = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')

    # Create a path to the RayResults folder on the desktop
    ray_results_path = os.path.join(desktop_path, 'RayResults')
    results = tune.Tuner(PPO, run_config=air.RunConfig(storage_path=ray_results_path, name=f"Ray_{date.today()}",stop=stop, verbose=1), param_space=config,).fit()

    if args.run_as_test:
        check_learning_achieved(results, args.stop_reward)

    ray.shutdown()

where the you can define total timesteps in the args

1 Like