**1. Severity of the issue: **
High: Completely blocks me.
3. What happened
In the Action Mask example, the environment’s observation space must be a Dict
containing both observations
and action_mask
, but when I add an LSTM the model insists on a flattened observation vector. How can I integrate a simple LSTM layer into the Action Mask example without breaking its required Dict
structure?
I use this connector to flatten the observation space. Without it, I can’t use the simple LSTM which can be defined in model config.
def _env_to_module(env, spaces=None, device=None):
return [
# Inject previous timestep’s actions and rewards into the sequence
PrevActionsPrevRewards(
multi_agent=False,
n_prev_rewards=4,
n_prev_actions=4,
),
# Flatten dictionary observation space; LSTM requires a continuous vector input
FlattenObservations(multi_agent=False),
]
And set my config.rl_module like this:
.rl_module(
rl_module_spec=RLModuleSpec(
module_class=DefaultPPOTorchRLModule,
model_config={
“fcnet_hiddens”: TRAIN_CONFIG[“fcnet_hiddens”],
“fcnet_activation”: TRAIN_CONFIG[“fcnet_activation”],
# Simple LSTM Setting (1 layer without attention)
“use_lstm”: True,
“lstm_cell_size”: 256,
“max_seq_len”: 100,
“lstm_use_prev_action”: True,
“lstm_use_prev_reward”: False,
“vf_share_layers”: True,
},
),
)