Hi All,
I am trying to train an AI to learn to play a game that cannot be simulated. The agent will interact with the game by simulating mouse clicks/drags (already implemented). There is also no realistic way of speeding the game up either so it must play in real-time.
A few things to note:
- All observation and action spaces will be discrete variables (I am planning on using OpenAI Gym discrete tuples).
- In order to significantly increase the rate of training/data acquisition, my friend and I are planning on running this on a couple of pc’s at once - which leads me to believe I would be looking at one of the distributed algo’s
- The game does not have a set length, meaning one agent can finish 3 games in the time a second one will finish a single one
If anyone could point me towards specific algorithms, as well as a few key points as to their logic I would greatly appreciate it!
1 Like
Interesting setup! Thanks for sharing @Denys_Ashikhin ! It sounds like you need a sample-efficient algo such that you do not have to interact that much with your environment. Also, because you have a discrete action space, you should try our DQN (with large buffer and a high train intensity (sample from environment a little + update/learn often)) or PPO (which you can set to a better sample complexity by increasing the num_sgd_iter param).
1 Like
Thanks, out of the two I am definitely more interested in PPO since it has LTSM support built in. However, the very top of the page lists PPO and APPO at (RLlib Algorithms — Ray v1.2.0) and not DP-PPO. Am I correct in assuming that DD-PPO also has the LSTM support?
Yes, DD-PPO also supports LSTMs and attention nets.
Thanks for all the help so far. Hopefully it’s okay if I ask a few more to make sure my setup is correct.
The more immediate is the way I have structured my run loop
.
def run(self): # if I can't get this to work, try not overriding it in the first place?
"""Override this to implement the run loop.
Your loop should continuously:
1. Call self.start_episode(episode_id)
2. Call self.get_action(episode_id, obs)
-or-
self.log_action(episode_id, obs, action)
3. Call self.log_returns(episode_id, reward)
4. Call self.end_episode(episode_id, obs)
5. Wait if nothing to do.
Multiple episodes may be started at the same time.
"""
episode_id = None
episode_id = self.start_episode(episode_id=episode_id)
while True: # not sure if it should be a literal loop buuuuuut?
gameObservation = self.underlord.getObservation() # needs to be implemented
gymObservation = self.transformObservation(gameObservation) # needs to be implemented
action = self.get_action(episode_id=episode_id, observation=gymObservation)
# also needs to be implemented
# gameObservation, reward = self.underlord.act(action=action[0], x=action[1], y=action[2],
# selection=action[3])
# gymObservation = self.transformObservation(gameObservation) # needs to be implemented
# don't think I should redo observation following an action. That will be done next loop run through
# instead this shows: Got y observation. Got x action. Reward following X-action under y-obs = z reward
reward = self.underlord.act(action=action[0], x=action[1], y=action[2], selection=action[3])
self.log_returns(episode_id=episode_id, reward=reward)
if self.underlord.finished != -1:
self.end_episode(episode_id=episode_id, observation=gymObservation)
episode_id = self.start_episode(episode_id=None)
Don’t worry about the not implemented (the game interaction is all done, it was just notes for me that I need to actually make a function to return it nicely, then transform those values into OpenAI gym space
If you could let me know if I understood the right way to implement it (for an external environment), I would greatly appreciate it! @sven1977
Hey @Denys_Ashikhin . This looks all correct! For external envs, it helps considering your env as being the thing you want to loop through (in some sort of game-loop) and you need to query RLlib’s policy server for a) actions and b) report rewards and episode boundaries back to it so RLlib can learn.