Help on Experiment with Training on Multiple Similar Environments

  • None: Just asking a question out of curiosity

So I’ve recently tried my hand at some Reinforcement Learning. I tried out trying to apply RL to traffic and see how it does.

I set out a goal to make a RL model be better than the base traffic signal configuration given by Eclipse sumo.

I gathered 14 days of traffic info on AM and PM on a net file consisting of 3 intersections, the idea being that the net file will remain the same throughout the entire experiment but the route file would be changed every 60 simulations or so.

I tried to setup a multi-agent environment where each of the intersections would try and reduce the wait time of cars within their intersection to 0.
I used the default reward of difference in waiting time set by sumo-rl as that seemed to have a better output than my initial guess of having the reward be the negative wait time of cars.

The idea was to have the model be trained for 60 iterations and then I would stop it and point the checkpoint to the latest one and then continue but on the next route file.
So the first 60 trainings would be done in an environment of Day1AM.rou.xml and
I would stop the training when i have gathered 60 csv output files then would change the route file to be Day1PM.rou.xml and would restore the trainer to the latest checkpoint generated.

Here is the code that I’m currently working on and would like to ask on whether the eventual output of training my model would be able to reach my goal.

If not would it be possible to ask on what aspects I need to take note of to better improve it?
Thank you