Trouble reproducing results with DQN

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hello there,
I have trouble reproducing the DQN result for Breakout. I have used very much the same parameters as specified in the YAML file.

Except I have changed the capacity of the memory to 180’000 and added compression.

It peaks at a score of around 11 and the average is at around 2.
The following code is the config I used:

replay_config = config.replay_buffer_config.update({
        "capacity": 180000,
        "type": "MultiAgentReplayBuffer",
        # "prioritized_replay_alpha": 0.6,
        # # "prioritized_replay_beta": 0.4,
        # "prioritized_replay_eps": 1e-6,
    })

replay_config = config.replay_buffer_config
replay_config["capacity"] = 180000
print(replay_config)
config = config.training(
                         # gamma=0.99,
                         lr=0.0000625,
                         train_batch_size=32,
                         # model=model_config,
                         dueling=False,
                         double_q=True,
                         target_network_update_freq=8000,
                         hiddens=[512],
                         n_step=1,
                         replay_buffer_config=replay_config,
                         # td_error_loss_fn="huber",
                         num_steps_sampled_before_learning_starts=20000,
                         adam_epsilon=0.00015,
                         #DISABLE RAINBOW
                         noisy=False,
                         num_atoms=1,
                        )

config = config.environment(env="ALE/Breakout-v5",
                            env_config={"frameskip": 1, # Disabled
                                       })
config = config.framework(framework="tf")
config = config.rollouts(
                        # num_rollout_workers=2,
                        # create_env_on_local_worker=True,
                        # num_envs_per_worker=2,
                        rollout_fragment_length=4,
                        # preprocessor_pref="deepmind",
                        compress_observations=True)

explore_config = config.exploration_config
explore_config["final_epsilon"] = 0.01
explore_config["epsilon_timesteps"] = 200000
print(explore_config)
config = config.exploration(
                           # explore=True,
                           exploration_config=explore_config
                           )
# config = config.checkpointing(export_native_model_files=True)
config = config.resources(num_gpus=1)

tuner = tune.Tuner("DQN",
                   run_config=air.RunConfig(stop={"agent_timesteps_total": 1e7},
                                           name="Ablation-Breakout-Removed-test",
                                            # API REFERENCE
                                            # https://docs.ray.io/en/master/ray-air/package-ref.html#ray.air.config.CheckpointConfig
                                           # checkpoint_config=air.CheckpointConfig(num_to_keep=5,
                                           #                                        checkpoint_score_attribute="episode_reward_max",
                                           #                                        checkpoint_score_order="max",
                                           #                                        # checkpoint_frequency=10,
                                           #                                        checkpoint_at_end=True,
                                           #                                       ),
                                          ),
                   param_space=config.to_dict())

results = tuner.fit()

Hi!
I don’t think the configs are the same.
I noted: double_q=True (False in tuned yaml).
There might be more.
There is a script that gives you the ability to execute the yaml file.
You can also execute yaml files in the right syntax with…
rllib train file <path_to_file>
Are you using a g3.4xl node or are you running this locally?

Hi Arturn,

Thanks for the response and for pointing out the test script. I will try to execute the YAML file with this.
I also tried to use the rllib train file example.yaml command, which also didn’t show a higher score.
Disabling Double DQN would still result in a max of 11 in Breakout.
I was the experiments locally. This would be the configuration:

  • Ubuntu 22.04.2
  • Python 3.8
  • ray 2.3.0

I hope this helps.

[EDIT]
I updated to python 3.10 and ray 2.3.1 and was running ray/atari-duel-ddqn.yaml at master · ray-project/ray · GitHub with only changing the capacity to 130k and the framework to ‘tf’. It wouldn’t run due to an RLock thread error.

@Dime ,
Can you please post a reproduction script?
We run this test script in our CI on every pull request to check if the rewards are reached.
DQN is sensitive to replay buffer size. So if that is the only difference, that might simply be the issue right there. Can you try to tune it a little?
Maybe do a grid search over capacity of 50, 150 and 450 or something like this and see how it compares?
Can you post a repro script to get to the RLock thread error?