I’ve implemented the board game Dominion as a gym environment and I’m wondering which training algorithm might be the most appropriate to use. I’m new to rllib, so I’ve kind of just left things on the default options while figuring things out, but now I actually have to make a decision on DQN vs A3C vs DDPG vs… there are a lot of them! Originally, I wanted to go with the AlphaZero algorithm, but it seems like the currently implemented version is meant for use with single player games, and (my implementation of) Dominion is a 2 player game.
A few notes about my Dominion gym implementation. Although there are two players, I use the same policy for both. Does this mean I should have made my environment a MultiAgentEnv instead? That seems more geared towards cases where each agent needs to take an action at every time step, whereas Dominion is turn based, so that didn’t seem right.
Any feedback or help/tips would be greatly appreciated!