Control sampling in action masking environment

I am using ray 2.10 now and toch 2.21, further following the guide to implement action_masking model as outlined in the action_masking_example.

In first experiments, I get unexpectedöy many failed trials / died workers. This is due to expections raised by logic I encoded in the step() function. However, testing separately for envrionment verification seems o.k.

→ Hence, my question: How can I get to the env state or sequence of actions which was taken until the env crashed?

This would support reproduction of the error and hence finding the loose end in the env logic.