Not Sure Which RLlib Algorithm To Use

Denys_Ashikhin · April 15, 2021, 1:07am

Hi All,

I am trying to train an AI to learn to play a game that cannot be simulated. The agent will interact with the game by simulating mouse clicks/drags (already implemented). There is also no realistic way of speeding the game up either so it must play in real-time.

A few things to note:

All observation and action spaces will be discrete variables (I am planning on using OpenAI Gym discrete tuples).
In order to significantly increase the rate of training/data acquisition, my friend and I are planning on running this on a couple of pc’s at once - which leads me to believe I would be looking at one of the distributed algo’s
The game does not have a set length, meaning one agent can finish 3 games in the time a second one will finish a single one

If anyone could point me towards specific algorithms, as well as a few key points as to their logic I would greatly appreciate it!

sven1977 · April 16, 2021, 4:55pm

Interesting setup! Thanks for sharing @Denys_Ashikhin ! It sounds like you need a sample-efficient algo such that you do not have to interact that much with your environment. Also, because you have a discrete action space, you should try our DQN (with large buffer and a high train intensity (sample from environment a little + update/learn often)) or PPO (which you can set to a better sample complexity by increasing the num_sgd_iter param).

Denys_Ashikhin · April 16, 2021, 5:51pm

Thanks, out of the two I am definitely more interested in PPO since it has LTSM support built in. However, the very top of the page lists PPO and APPO at (RLlib Algorithms — Ray v1.2.0) and not DP-PPO. Am I correct in assuming that DD-PPO also has the LSTM support?

sven1977 · April 21, 2021, 7:10am

Yes, DD-PPO also supports LSTMs and attention nets.

Denys_Ashikhin · April 23, 2021, 11:42pm

Thanks for all the help so far. Hopefully it’s okay if I ask a few more to make sure my setup is correct.
The more immediate is the way I have structured my run loop.

    def run(self):  # if I can't get this to work, try not overriding it in the first place?
        """Override this to implement the run loop.
        Your loop should continuously:
            1. Call self.start_episode(episode_id)
            2. Call self.get_action(episode_id, obs)
                    -or-
                    self.log_action(episode_id, obs, action)
            3. Call self.log_returns(episode_id, reward)
            4. Call self.end_episode(episode_id, obs)
            5. Wait if nothing to do.
        Multiple episodes may be started at the same time.
        """
        episode_id = None
        episode_id = self.start_episode(episode_id=episode_id)
        while True:  # not sure if it should be a literal loop buuuuuut?
            gameObservation = self.underlord.getObservation()  # needs to be implemented
            gymObservation = self.transformObservation(gameObservation)  # needs to be implemented

            action = self.get_action(episode_id=episode_id, observation=gymObservation)

            # also needs to be implemented
            # gameObservation, reward = self.underlord.act(action=action[0], x=action[1], y=action[2],
            #                                              selection=action[3])
            # gymObservation = self.transformObservation(gameObservation)  # needs to be implemented
            #     don't think I should redo observation following an action. That will be done next loop run through
            # instead this shows: Got y observation. Got x action. Reward following X-action under y-obs = z reward
            reward = self.underlord.act(action=action[0], x=action[1], y=action[2], selection=action[3])
            self.log_returns(episode_id=episode_id, reward=reward)

            if self.underlord.finished != -1:
                self.end_episode(episode_id=episode_id, observation=gymObservation)
                episode_id = self.start_episode(episode_id=None)

Don’t worry about the not implemented (the game interaction is all done, it was just notes for me that I need to actually make a function to return it nicely, then transform those values into OpenAI gym space

If you could let me know if I understood the right way to implement it (for an external environment), I would greatly appreciate it! @sven1977

sven1977 · April 27, 2021, 8:07am

Hey @Denys_Ashikhin . This looks all correct! For external envs, it helps considering your env as being the thing you want to loop through (in some sort of game-loop) and you need to query RLlib’s policy server for a) actions and b) report rewards and episode boundaries back to it so RLlib can learn.

Topic		Replies	Views
Custom Algorithm Configure Algorithm, Training, Evaluation, Scaling	1	503	November 30, 2022
Distributed APPO With Flexible Number of Workers and Custom Environment RLlib	0	336	December 17, 2020
About the RLlib category RLlib	2	779	March 5, 2025
RLlib tutorials or courses? RLlib	2	799	December 3, 2020
Does RLlib algorithm support both discrete and continuous action spaces simultaneously? RLlib	7	1637	February 22, 2023

Not Sure Which RLlib Algorithm To Use

Related topics