Trouble reproducing results with DQN

Dime · February 21, 2023, 2:12pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hello there,
I have trouble reproducing the DQN result for Breakout. I have used very much the same parameters as specified in the YAML file.

github.com

ray-project/ray/blob/master/rllib/tuned_examples/dqn/atari-dqn.yaml

# Runs on a single g3.4xl node
# See https://github.com/ray-project/rl-experiments for results
atari-basic-dqn:
    env:
        grid_search:
            - ALE/Breakout-v5
            - ALE/BeamRider-v5
            - ALE/Qbert-v5
            - ALE/SpaceInvaders-v5
    run: DQN
    config:
        # Works for both torch and tf.
        framework: tf
        env_config:
            frameskip: 1  # no frameskip
        double_q: false
        dueling: false
        num_atoms: 1
        noisy: false
        replay_buffer_config:

This file has been truncated. show original

Except I have changed the capacity of the memory to 180’000 and added compression.

It peaks at a score of around 11 and the average is at around 2.
The following code is the config I used:

replay_config = config.replay_buffer_config.update({
        "capacity": 180000,
        "type": "MultiAgentReplayBuffer",
        # "prioritized_replay_alpha": 0.6,
        # # "prioritized_replay_beta": 0.4,
        # "prioritized_replay_eps": 1e-6,
    })

replay_config = config.replay_buffer_config
replay_config["capacity"] = 180000
print(replay_config)
config = config.training(
                         # gamma=0.99,
                         lr=0.0000625,
                         train_batch_size=32,
                         # model=model_config,
                         dueling=False,
                         double_q=True,
                         target_network_update_freq=8000,
                         hiddens=[512],
                         n_step=1,
                         replay_buffer_config=replay_config,
                         # td_error_loss_fn="huber",
                         num_steps_sampled_before_learning_starts=20000,
                         adam_epsilon=0.00015,
                         #DISABLE RAINBOW
                         noisy=False,
                         num_atoms=1,
                        )

config = config.environment(env="ALE/Breakout-v5",
                            env_config={"frameskip": 1, # Disabled
                                       })
config = config.framework(framework="tf")
config = config.rollouts(
                        # num_rollout_workers=2,
                        # create_env_on_local_worker=True,
                        # num_envs_per_worker=2,
                        rollout_fragment_length=4,
                        # preprocessor_pref="deepmind",
                        compress_observations=True)

explore_config = config.exploration_config
explore_config["final_epsilon"] = 0.01
explore_config["epsilon_timesteps"] = 200000
print(explore_config)
config = config.exploration(
                           # explore=True,
                           exploration_config=explore_config
                           )
# config = config.checkpointing(export_native_model_files=True)
config = config.resources(num_gpus=1)

tuner = tune.Tuner("DQN",
                   run_config=air.RunConfig(stop={"agent_timesteps_total": 1e7},
                                           name="Ablation-Breakout-Removed-test",
                                            # API REFERENCE
                                            # https://docs.ray.io/en/master/ray-air/package-ref.html#ray.air.config.CheckpointConfig
                                           # checkpoint_config=air.CheckpointConfig(num_to_keep=5,
                                           #                                        checkpoint_score_attribute="episode_reward_max",
                                           #                                        checkpoint_score_order="max",
                                           #                                        # checkpoint_frequency=10,
                                           #                                        checkpoint_at_end=True,
                                           #                                       ),
                                          ),
                   param_space=config.to_dict())

results = tuner.fit()

arturn · April 13, 2023, 11:00pm

Hi!
I don’t think the configs are the same.
I noted: double_q=True (False in tuned yaml).
There might be more.
There is a script that gives you the ability to execute the yaml file.
You can also execute yaml files in the right syntax with…
rllib train file <path_to_file>
Are you using a g3.4xl node or are you running this locally?

Dime · April 14, 2023, 12:11am

Hi Arturn,

Thanks for the response and for pointing out the test script. I will try to execute the YAML file with this.
I also tried to use the rllib train file example.yaml command, which also didn’t show a higher score.
Disabling Double DQN would still result in a max of 11 in Breakout.
I was the experiments locally. This would be the configuration:

Ubuntu 22.04.2
Python 3.8
ray 2.3.0

I hope this helps.

[EDIT]
I updated to python 3.10 and ray 2.3.1 and was running ray/atari-duel-ddqn.yaml at master · ray-project/ray · GitHub with only changing the capacity to 130k and the framework to ‘tf’. It wouldn’t run due to an RLock thread error.

arturn · April 14, 2023, 9:33pm

@Dime ,
Can you please post a reproduction script?
We run this test script in our CI on every pull request to check if the rewards are reached.
DQN is sensitive to replay buffer size. So if that is the only difference, that might simply be the issue right there. Can you try to tune it a little?
Maybe do a grid search over capacity of 50, 150 and 450 or something like this and see how it compares?
Can you post a repro script to get to the RLock thread error?

Topic		Replies	Views
Tuned examples not working RLlib	1	390	April 13, 2023
[RLlib] Ray Out Of Memory Error RLlib	2	1286	June 14, 2021
DQN in RLlib not leading to the same results as Vanilla PyTorch Implementation Configure Algorithm, Training, Evaluation, Scaling	0	343	June 21, 2023
Setting config["dueling"]=False still runs Dueling DQN RLlib	2	351	August 19, 2021
Reproduce R2D2 paper RLlib	2	486	June 28, 2022

Trouble reproducing results with DQN

Related topics