Hi,
I am working on a problem where I am optimizing battery design and dispatch to the power market. The design involves two static variables, while the dispatch is made up of sequential actions to charge and discharge the battery hourly. I can already fix the static parameters and optimize the sequential dispatch actions using APPO with about 10 actors in parallel. I am trying to embed a genetic algorithm optimizer that simultaneously optimizes the battery design static variables. I want it to work such that every time the env is reset, then the battery static design is changed based on what the genetic algorithm suggests. Then one actors gets to play a full episode with this design and returns the cumulative net revenue to the genetic algorithm which updates itself. How can I achieve that?