Initial action for Dict action space

Hi @Lars_Simon_Zehnder,

rollout.py is used to generate rollouts on a policy it does not do any training so I don’t think this is where you problem lies, unless I misunderstand your question.

The first place I would look, and maybe you already have is in the reset function of your environment. This is where the first observation will come from. Is it somehow returning something different for the observation then step is?

If I were you I would also be concerned with those nan’s.

Are you handling the combination of Discrete and Continuous actions in a special way? I do not remember seeing rllib handle mixed action spaces but in all honesty it could be there and I have not encountered it.

Manny

1 Like