Best training algo for turn based board game?

I’ve implemented the board game Dominion as a gym environment and I’m wondering which training algorithm might be the most appropriate to use. I’m new to rllib, so I’ve kind of just left things on the default options while figuring things out, but now I actually have to make a decision on DQN vs A3C vs DDPG vs… there are a lot of them! Originally, I wanted to go with the AlphaZero algorithm, but it seems like the currently implemented version is meant for use with single player games, and (my implementation of) Dominion is a 2 player game.

A few notes about my Dominion gym implementation. Although there are two players, I use the same policy for both. Does this mean I should have made my environment a MultiAgentEnv instead? That seems more geared towards cases where each agent needs to take an action at every time step, whereas Dominion is turn based, so that didn’t seem right.

Any feedback or help/tips would be greatly appreciated!

Hey @aronstar, yes, you should try the multiagent API and make your env a MultiAgentEnv. You just have to return one obs in your obs-dict at a time (changing between player 1’s obs and player 2’s obs each turn). RLlib will always only calculate the action for that agent whose obs are present in the dict.
For the algorithm. It depends on your action space. If it’s discrete, I would suggest DQN, PPO, or IMPALA. For continuous action spaces, try SAC. For mixed (e.g. Dict/Tuple) action spaces, you can try PPO or A3C.

2 Likes