Not Sure Which RLlib Algorithm To Use

Hi All,

I am trying to train an AI to learn to play a game that cannot be simulated. The agent will interact with the game by simulating mouse clicks/drags (already implemented). There is also no realistic way of speeding the game up either so it must play in real-time.

A few things to note:

  1. All observation and action spaces will be discrete variables (I am planning on using OpenAI Gym discrete tuples).
  2. In order to significantly increase the rate of training/data acquisition, my friend and I are planning on running this on a couple of pc’s at once - which leads me to believe I would be looking at one of the distributed algo’s
  3. The game does not have a set length, meaning one agent can finish 3 games in the time a second one will finish a single one

If anyone could point me towards specific algorithms, as well as a few key points as to their logic I would greatly appreciate it!

1 Like

Interesting setup! Thanks for sharing @Denys_Ashikhin ! It sounds like you need a sample-efficient algo such that you do not have to interact that much with your environment. Also, because you have a discrete action space, you should try our DQN (with large buffer and a high train intensity (sample from environment a little + update/learn often)) or PPO (which you can set to a better sample complexity by increasing the num_sgd_iter param).

1 Like

Thanks, out of the two I am definitely more interested in PPO since it has LTSM support built in. However, the very top of the page lists PPO and APPO at (RLlib Algorithms — Ray v1.2.0) and not DP-PPO. Am I correct in assuming that DD-PPO also has the LSTM support?

Yes, DD-PPO also supports LSTMs and attention nets.

Thanks for all the help so far. Hopefully it’s okay if I ask a few more to make sure my setup is correct.
The more immediate is the way I have structured my run loop.

    def run(self):  # if I can't get this to work, try not overriding it in the first place?
        """Override this to implement the run loop.
        Your loop should continuously:
            1. Call self.start_episode(episode_id)
            2. Call self.get_action(episode_id, obs)
                    self.log_action(episode_id, obs, action)
            3. Call self.log_returns(episode_id, reward)
            4. Call self.end_episode(episode_id, obs)
            5. Wait if nothing to do.
        Multiple episodes may be started at the same time.
        episode_id = None
        episode_id = self.start_episode(episode_id=episode_id)
        while True:  # not sure if it should be a literal loop buuuuuut?
            gameObservation = self.underlord.getObservation()  # needs to be implemented
            gymObservation = self.transformObservation(gameObservation)  # needs to be implemented

            action = self.get_action(episode_id=episode_id, observation=gymObservation)

            # also needs to be implemented
            # gameObservation, reward = self.underlord.act(action=action[0], x=action[1], y=action[2],
            #                                              selection=action[3])
            # gymObservation = self.transformObservation(gameObservation)  # needs to be implemented
            #     don't think I should redo observation following an action. That will be done next loop run through
            # instead this shows: Got y observation. Got x action. Reward following X-action under y-obs = z reward
            reward = self.underlord.act(action=action[0], x=action[1], y=action[2], selection=action[3])
            self.log_returns(episode_id=episode_id, reward=reward)

            if self.underlord.finished != -1:
                self.end_episode(episode_id=episode_id, observation=gymObservation)
                episode_id = self.start_episode(episode_id=None)

Don’t worry about the not implemented (the game interaction is all done, it was just notes for me that I need to actually make a function to return it nicely, then transform those values into OpenAI gym space

If you could let me know if I understood the right way to implement it (for an external environment), I would greatly appreciate it! @sven1977

Hey @Denys_Ashikhin . This looks all correct! :slight_smile: For external envs, it helps considering your env as being the thing you want to loop through (in some sort of game-loop) and you need to query RLlib’s policy server for a) actions and b) report rewards and episode boundaries back to it so RLlib can learn.