DQN algorithm possible bug

Hello! I’m trying to implement a DQN algorithm, but after configuring it and running the program an error appears:

PS C:\Users\grhen\Documents\GitHub\EP_RLlib>  c:; cd 'c:\Users\grhen\Documents\GitHub\EP_RLlib'; & 'C:\Users\grhen\anaconda3\envs\rllib290\python.exe' 'c:\Users\grhen\.vscode\extensions\ms-python.python-2023.22.1\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher' '62617' '--' 'c:\Users\grhen\Documents\GitHub\EP_RLlib\VENT_init_training.py'
2024-01-09 11:15:49,983 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 
Traceback (most recent call last):
  File "c:\Users\grhen\Documents\GitHub\EP_RLlib\VENT_init_training.py", line 136, in <module>
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\dqn\dqn.py", line 244, in training
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\simple_q\simple_q.py", line 243, in training
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\algorithm_config.py", line 1816, in training
    self.optimizer = merge_dicts(self.optimizer, optimizer)
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\_private\dict.py", line 22, in merge_dicts
    deep_update(merged, d2, True, [])
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\_private\dict.py", line 58, in deep_update
    for k, value in new_dict.items():
AttributeError: 'NoneType' object has no attribute 'items'

My configuration (part of it) is the following:

from ray.rllib.algorithms.dqn.dqn import DQNConfig

algo = DQNConfig().training(
            gamma = 0.99 if not tune_runner else tune.uniform(0.7, 0.99),
            lr = 0.1 if not tune_runner else tune.uniform(0.001, 0.1),
            grad_clip = None,
            grad_clip_by = 'global_norm',
            train_batch_size = 128 if not tune_runner else tune.randint(128, 257),
            optimizer = None,
            max_requests_in_flight_per_sampler_worker = None,
            learner_class = None,
            _enable_learner_api = None,
            num_atoms = 1 if not tune_runner else tune.randint(1, 11),
            v_min = -1 if not tune_runner else tune.randint(-10, 0),
            v_max = 1 if not tune_runner else tune.randint(1, 11),
            noisy = True,
            sigma0 = 1 if not tune_runner else tune.uniform(0.01, 0.99),
            dueling = True,
            hiddens = [256],
            double_q = True,
            n_step = 1 if not tune_runner else tune.randint(1, 11),
            training_intensity = None,
            replay_buffer_config = {
                '_enable_replay_buffer_api': True,
                'type': 'MultiAgentPrioritizedReplayBuffer',
                'capacity': 50000,
                'prioritized_replay_alpha': 0.6,
                'prioritized_replay_beta': 0.4,
                'prioritized_replay_eps': 1e-6,
                'replay_sequence_length': 1,
            td_error_loss_fn = None,
            categorical_distribution_temperature = 1.0,
            observation_space=gym.spaces.Box(float("-inf"), float("inf"), (49,)),
                'sys_path': path,
                'ep_terminal_output': ep_terminal_output,
                'csv': False,
                'output': TemporaryDirectory("output","DQN_",path+'/Resultados_RLforEP').name,
                'epw': path+'/GitHub/EP_RLlib/EP_Wheater_Configuration/Mendoza_Obs_-hour-historico1.epw',
                'idf': path+'/GitHub/EP_RLlib/EP_IDF_Configuration/model_1.epJSON',
                'idf_folderpath': path+"/GitHub/EP_RLlib/EP_IDF_Configuration",
                'idf_output_folder': path+"/models",
                'climatic_stads': path+'/GitHub/EP_RLlib/EP_Wheater_Configuration',
                'beta': 0,
                'E_max': 2.5/6,
                'separate_state_space': True,
                'one_hot_state_encoding': True,
                'episode': -1,
                'is_test': False,
            framework = 'torch',
            recreate_failed_workers = True,
            num_rollout_workers = 1,
            rollout_fragment_length = 'auto',
            enable_connectors = True,
            _enable_new_api_stack = True,
        ).reporting( # multi_agent config va aquí
            min_sample_timesteps_per_iteration = 2000,
            export_native_model_files = True,
            log_level = "ERROR",
            num_gpus = 0,

I can’t find out if it’s a mistake I made. Following the error message I find that on line 1816 of the script ray/rllib/algorithms/algorithm_config.py it tries to merge the optimizer, however, as I have configured _enable_learner_api=True this should be ignored, right?

Can anyone help me solve it? Thank you!

I commented out the optimizer option and the problem disappeared (I don’t understand why). Now another error appears, not in the configuration but in the execution of the training:

Traceback (most recent call last):
  File "c:\Users\grhen\Documents\GitHub\EP_RLlib\VENT_init_training.py", line 311, in <module>
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\tuner.py", line 381, in fit      
    return self._local_tuner.fit()
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\impl\tuner_internal.py", line 509, in fit
    analysis = self._fit_internal(trainable, param_space)
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\impl\tuner_internal.py", line 628, in _fit_internal
    analysis = run(
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\tune.py", line 1002, in run      
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\execution\tune_controller.py", line 722, in step
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\execution\tune_controller.py", line 822, in _maybe_update_trial_queue
    if not self._update_trial_queue(blocking=not dont_wait_for_trial):
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\execution\tune_controller.py", line 625, in _update_trial_queue
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\execution\tune_controller.py", line 573, in add_trial
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\tune\experiment\trial.py", line 429, in create_placement_group_factory
    default_resources = trainable_cls.default_resource_request(self.config)
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2383, in default_resource_request
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\dqn\dqn.py", line 280, in validate
  File "C:\Users\grhen\anaconda3\envs\rllib290\lib\site-packages\ray\rllib\algorithms\simple_q\simple_q.py", line 279, in validate
    if self.exploration_config["type"] == "ParameterNoise":
KeyError: 'type'

@hermmanhender , thanks for posting this. From a first view it looks as if the code tries to merge two dicts, of which one is None.

Can you try to set either optimizer in training to {} or not at all? I think this should avoid the error.

Hi @Lars_Simon_Zehnder, thanks for your suggestion! Sorry the late response.

I changed the optimizer value to an empty dict and the first error disappear, like with comment it solution.

But the problem with the exploration validation still appearing.

Hi again. I set:

DQNConfig().).experimental(_enable_new_api_stack = False)

and now it work.
Sorry! I think the error was because the DQN algorithm is not implemented in the new RLModule.

Great that you found out we the error was. Yes DQN is transferred into the new stack, yet. I say yet because we are implementing at the moment. I should come soon.

