Action space for choosing a sequence of items from a bigger sequence

cool-RR · February 2, 2023, 9:33pm

Hi,

I’m trying to design a game, in which each agent’s action space is choosing a sequence of 5 numbers from 0 to 19, in a certain order, without choosing the same number more than once.

Example actions from this space (written in list notation rather than numpy array):

[4, 15, 3, 2, 11]
[0, 5, 2, 8, 15]

How would you use the spaces provided by RLlib/gym to best express this action space so as to make an algorithm like PPO or Impala learn effective behaviors as easily as possible?

Thanks for your help,
Ram Rachum.

mannyv · February 7, 2023, 2:34am

Hi @cool-RR,

Is the agent choosing the 5 numbers in the same step or one at a time over 5 steps?

cool-RR · February 7, 2023, 9:21am

The agent is choosing the 5 numbers in the same step.

cool-RR · February 21, 2023, 8:12pm

@mannyv Any idea how to tackle this?

cool-RR · March 1, 2023, 12:28pm

@kourosh Could you help here please?

mannyv · March 1, 2023, 1:50pm

Hi @cool-RR,

I think I would start by creating a new multi-discrete action distribution that captures the ideas from this paper:

Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement

You can try other approaches too but I think more generally I would try an approach that uses sampling without replacement of a multi-discrete categorical distribution that adjusts the entropy, kl, and log_prob appropriately. If you implement something like that then you should be able to use A2C/PPO in rllib without modifications.

Manny

cool-RR · March 1, 2023, 3:09pm

Thank you! I’ll give it a try.

Topic		Replies	Views
Custom action space Configure Algorithm, Training, Evaluation, Scaling	4	588	July 31, 2023
Parameterised (hierarchical) action space using RLlib Configure Algorithm, Training, Evaluation, Scaling	0	412	May 30, 2023
[RLlib] Is it possible to change action_space during training? RLlib	1	401	March 22, 2022
Continuous action space and custom model RLlib	4	1533	July 17, 2021
Variable-length / Parametric Action Spaces RLlib	1	540	August 31, 2021

Action space for choosing a sequence of items from a bigger sequence

Related topics