Hello,
my goal is to train a NN for a card game with self-play; this is my first experience in this field (I am using ray-2.0.0.dev0). In my PettingZoo environment I have:
self.action_spaces = {agent: Dict({
GameState.BIDDING: Discrete(BID_ACTIONS),
GameState.CHOOSE_TRUMP: Discrete(CHOOSE_TRUMP_ACTIONS),
GameState.TRICK: Discrete(TRICK_ACTIONS)
}) for agent in self.agents}
self.observation_spaces = {agent: Dict({
'observations': Dict({
'gamestate' : Discrete(len(GameState)),
'player_hand' : Box(low=0, high=1, shape=(TRICK_ACTIONS,), dtype=bool),
# [...] other state I keep temporarily commented
}),
'action_mask': Dict({
GameState.BIDDING: Box(low=0, high=1, shape=(BID_ACTIONS,), dtype=bool),
GameState.CHOOSE_TRUMP: Box(low=0, high=1, shape=(CHOOSE_TRUMP_ACTIONS,), dtype=bool),
GameState.TRICK: Box(low=0, high=1, shape=(TRICK_ACTIONS,), dtype=bool)
})
}) for agent in self.agents}
To be able to apply action masks, as suggested in the documentation, I defined a custom model:
class ParametricActionsModel(TFModelV2):
def __init__(self,
obs_space,
action_space,
num_outputs,
model_config,
name,
**kw):
super().__init__(
obs_space, action_space, num_outputs, model_config, name, **kw)
self.prep = get_preprocessor(obs_space.original_space.spaces['observations'])
orig_obs_space = self.prep(obs_space.original_space.spaces['observations'])
self.action_embed_model = FullyConnectedNetwork(
orig_obs_space,
action_space,
num_outputs,
model_config,
name + "_action_embedding"
)
def forward(self, input_dict, state, seq_lens):
# Extract the available actions tensor from the observation.
action_mask = input_dict["obs"]["action_mask"]
# Compute the predicted action embedding
orig_obs = self.prep.transform(input_dict["obs"]["observations"])
action_logits, _ = self.action_embed_model({
"obs": orig_obs
})
# Mask out invalid actions (use tf.float32.min for stability)
inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
return action_logits + inf_mask, state
def value_function(self):
return self.action_embed_model.value_function()
However, I get the error below:
File "E:/Vmware shared folder/python/BriscolaChiamata/train.py", line 44, in forward
orig_obs = self.prep.transform(input_dict["obs"]["observations"])
TypeError: transform() missing 1 required positional argument: 'observation' at time: 1.64718e+09
The point is: the actual observation I want to feed the NN with is input_dict[“obs”][“observations”], but I don’t know how to actually pass it to self.action_embed_model(). I am not even sure the ParametricActionsModel.__init__()
does the right things.
By the way, a previous temporary version of this code, where I had:
self.observation_spaces = {agent: Dict({
'observations': Box( #...
'action_mask': # as above
and the custom ParametricActionsModel (without the preprocessor and the transform()), did not cause errors. However, using a Dict space helps me to keep the code clearer.
Can someone please help me? I did not found existing examples helpful enough for my (very low) level of expertise.
Thanks in advance
How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.