Thanks for all the help so far. Hopefully it’s okay if I ask a few more to make sure my setup is correct.
The more immediate is the way I have structured my
def run(self): # if I can't get this to work, try not overriding it in the first place?
"""Override this to implement the run loop.
Your loop should continuously:
1. Call self.start_episode(episode_id)
2. Call self.get_action(episode_id, obs)
self.log_action(episode_id, obs, action)
3. Call self.log_returns(episode_id, reward)
4. Call self.end_episode(episode_id, obs)
5. Wait if nothing to do.
Multiple episodes may be started at the same time.
episode_id = None
episode_id = self.start_episode(episode_id=episode_id)
while True: # not sure if it should be a literal loop buuuuuut?
gameObservation = self.underlord.getObservation() # needs to be implemented
gymObservation = self.transformObservation(gameObservation) # needs to be implemented
action = self.get_action(episode_id=episode_id, observation=gymObservation)
# also needs to be implemented
# gameObservation, reward = self.underlord.act(action=action, x=action, y=action,
# gymObservation = self.transformObservation(gameObservation) # needs to be implemented
# don't think I should redo observation following an action. That will be done next loop run through
# instead this shows: Got y observation. Got x action. Reward following X-action under y-obs = z reward
reward = self.underlord.act(action=action, x=action, y=action, selection=action)
if self.underlord.finished != -1:
episode_id = self.start_episode(episode_id=None)
Don’t worry about the not implemented (the game interaction is all done, it was just notes for me that I need to actually make a function to return it nicely, then transform those values into OpenAI gym space
If you could let me know if I understood the right way to implement it (for an external environment), I would greatly appreciate it! @sven1977