Hi @dylan906,
It is hard for us to help without seeing the majority of the logic. If you can share that we could have a better idea of what the issue might be.
Is there a reason you are not just using the FCNet as is rather than only using the _hidden_layers and then adding your own logits layer on top?
The way I would approach this is in two parts.
-
Verify the environment is meeting rllibs expectations.
For this I would use one of the built in models with the custom environment. This will obviously not have the action masking but if it ran cleanly you would know your environment is running as expected. -
I would use the custom model with rllibs randomenv. This environment allows you to specify an observation and action space and then generates random values that match the space you provided. Using this environment would allow you to determine if there are issues with the custom model.