Hi Everyone! I would like some help with the DQN Trainer
When I was trying to use RLib DQN Trainer to train my Eclipse Sumo Environment it works out fine in that it explores and then I assume exploits once enough time steps have elapsed to reduce the Epsilon Greedy value to its minimum of 0.2.
My issue comes from when I try to evaluate the model in the hopes to see what the data would look like assuming there was no exploration at the start of the training simulation.
It outputs very bad results and gets stuck repeating one action all throughout the Simulation.
My best guess for why that is happening is that the policy doesn’t properly contain the observation and actions recorded from the training simulation.
Here are the files I was using
https://github.com/Sitting-Down/RLlib-DQN-Experiments
the main packages I used was RLlib for the DQN Trainer and Sumo-RL as a environment
Any help would be appreciated