RLlib DQN Trainer Evaluate Function Help

Hi Everyone! I would like some help with the DQN Trainer

When I was trying to use RLib DQN Trainer to train my Eclipse Sumo Environment it works out fine in that it explores and then I assume exploits once enough time steps have elapsed to reduce the Epsilon Greedy value to its minimum of 0.2.

My issue comes from when I try to evaluate the model in the hopes to see what the data would look like assuming there was no exploration at the start of the training simulation.
It outputs very bad results and gets stuck repeating one action all throughout the Simulation.

My best guess for why that is happening is that the policy doesn’t properly contain the observation and actions recorded from the training simulation.

Here are the files I was using
the main packages I used was RLlib for the DQN Trainer and Sumo-RL as a environment

Any help would be appreciated

@Sitting-Down , without looking into your code as no specific line has been mentioned, I guess you need to also set the evaluation_config parameter in your main config:

"evaluation_config": {
        "explore": False,
        "exploration_config": {
                .... # set this if you want to have exploration during evaluation, but with different settings

Hope this helps