I am currently trying to use Reinforcement Learning to Optimize some parameters in a simulation software.
Therefore I build a custom environment to do these simulation. Currently I am having a lot of trouble getting decent results.
For explanation, what I want to do:
Optimize a couple of values (one or two) for different boundary conditions. The boundary conditions are completely independent of each other and there are no real timesteps.
What I am currently doing:
I define the limits of my boundary conditions (=BC) in the environment. There the BC get initialized randomly. The agent then predicts an action according to these BC and I do a step in my environment. According to the Action, the Simulation calculates the output and a reward is calculated from that (Just a linear function, which is the abs of the distance from the desired output). After that I set done to true and in the reset functions the BC are again initialized randomly.
However I noticed the performance is very poorly.
As a demonstration:
I want to optimize the simulation for the values [range(0, 50), range(1000, 5000)]
BC: [11.2392, 3092.9291]
Reward : 0.1
new BC: [2.1242, 1234.0212]
I am currently implementing that the BC get initilaized equidistant eg.:
BC: [0, 1000]
newBC: [10, 2000]
newBC: [20, 3000]
Are there any recommendations how to set up such a environment?
Should i set done=True after each BC?
Thanks in advance