Initial action for Dict action space

Hi folks,

I have the following action space in my gym environment :

action_space = {
            "trade": Discrete(3),            
            "stop": Box(low=0.0, high=np.inf, shape=(1,), dtype=np.float32),
        }
self.action_space = Dict(action_space)

Executing my code gives me an error of the following form:

...
  File "/home/simon/git-projects/learning/.venv/lib/python3.9/site-packages/ray/rllib/utils/debug.py", line 38, in _summarize
    obj.shape, obj.dtype, _summarize(obj[0])))
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

I debugged my code and found that my episode data (more precisely: actions therein) contains at the first position obj[0] a different sized array, namely array(0, dtype=object). Here is an example:

array([array(0, dtype=object), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([ 1.08566999, -1.        ]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),

It appears that only the first action looks different and makes problems. I guess this is the intial action and it is somehow defined in the rllib source code (I think it is here). However, when I use the same code as linked in the last sentence, I get:

[0.05556222 1.        ]

which appears fine to me.

So, I wonder, where does this first action come from and what do I have to change to make my code run again? Maybe @sven1977 or @mannyv know more :slight_smile:

Any help welcome and thanks for your time (and the fish)

Hi @Lars_Simon_Zehnder,

rollout.py is used to generate rollouts on a policy it does not do any training so I don’t think this is where you problem lies, unless I misunderstand your question.

The first place I would look, and maybe you already have is in the reset function of your environment. This is where the first observation will come from. Is it somehow returning something different for the observation then step is?

If I were you I would also be concerned with those nan’s.

Are you handling the combination of Discrete and Continuous actions in a special way? I do not remember seeing rllib handle mixed action spaces but in all honesty it could be there and I have not encountered it.

Manny

1 Like

Hi @mannyv ,

thanks for your help again. I also got a little more precise in my question above. The problem in my epsiode data is the actions array with actions from a single episode. And the first of these is different than the others. Exactly this first one makes problems in the postprocessing of an episode. My question is now - what produces this first action and what do I have to change to get as first action a similar array as the others?

The first place I would look, and maybe you already have is in the reset function of your environment. This is where the first observation will come from. Is it somehow returning something different for the observation then step is?

In the reset() method of my environment I actually do not generate actions - is that something one should? The reset() function in my environment returns simply the observation, whereas my step() function returns in addition also reward, done, and info. I think that should work.

If I were you I would also be concerned with those nan’s.

The nans come on purpose. Earlier I used None values as an indicator that an agent does nothing (only in the stop variable of the action as this is a float, I could also use 0.0) , but this brought up errors. Would you rather suggest using 0.0?

Are you handling the combination of Discrete and Continuous actions in a special way? I do not remember seeing rllib handle mixed action spaces but in all honesty it could be there and I have not encountered it.

Good question @mannyv ! I actually came up with this because I learned to be type-conform and as it makes the code more readable with having names and specific types. trade is an indicator and therefore I used discrete values. I could of course also use a float for the second value (trade) and simply a Box-space with shape=(2,). Maybe this is the reason for this phenomenon.

Simon

Hi @mannyv,

I tested now two further action space versions - and your intuition was pretty right :slight_smile: The problem is rooted in the action space type.

I first used a Dict action space:

action_space = {
      'trade': Box(low=-1, high=1, shape=(1,), dtype=np.int8),
      'stop': Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float64)
}

action_space = Dict(action_space)

This gave the same error as in my initial question. Then I tried a simple Box action space:

action_space = Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.float64)

and this worked out. Training runs through now. Probably Dict action spaces are not yet implemented? It would be a nice feature as it allows to refer to certain action elements by name and makes code more readable.

I will try now, if a Tuple action space will work.

@Lars_Simon_Zehnder,

This issue might have a similar root cause. You might want to track it and see if it fixes your issue when it is resolved…

Hi @mannyv ,

indeed interesting and up-to-date. Thank you!

I tested Tuple and this also works fine with my setup. So Dict spaces for actions face problems in postprocessing.

Simon