ARS produces actions outside of `action_space` bounds

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am working with FinRL-Meta (link in reproduction project). I wanted to try using RLlib’s ARS implementation with the same codebase, but the ARS model (using compute_single_action()) is producing actions outside my environment’s defined action_space.

I have a reproduction project here, in a notebook. It’s based on my latest pull of FinRL-Meta, without any future changes.

In this case, the environment defines action_space:

self.action_space = spaces.Box(low=0,
                               high=3,
                               shape=(len(self.assets),)) # len(self.assets) always equals 1 currently

Yet (usually) this code is producing actions below 0, in the 0 to -1 range

Other code built on this (modified environment, etc) where I have a -3 to 3 Box action space, sometimes sees actions as far out-of-bounds as ± 60.

Am I misunderstanding how action_space works? Shouldn’t the model be normalizing/squashing/clipping its actions into that space before returning them to me? I would like to use the action as a confidence indicator, but when most of the results are entirely out of the space, I’m not sure how - doesn’t matter if the action is betwen 0-1, 1-2, 2-3, when many of them are 10 or 20 or 30.

What I have tried

Changing space type

I have tried switching to a Discrete action space, and switching the Box from float32 to int32 . In both of those cases, the OOB issue goes away (computed actions are an integer within range, as expected), but I lose all reproducibility in my tests. The tests no longer produce the same results on the same input data, despite explore=False, which makes it difficult to compare trained models.

Upgrading Ray/gym

I tried upgrading Ray to 2.0.0 (and gym to 0.24) but this had no effect on the OOB issue.

unsquash_actions=True

I found this post and tried it on my compute_single_action() call, with no success.

Edited to add
Versions:
ray = 1.12.0
gym = 0.21.0
python = 3.7

I also tried with PPO, which did respect the action_space high/low and produced repeatable results, but didn’t perform as well as ARS.

Is there a setting I haven’t found, which would force ARS to respect the action space? Or should I be updating rewards to punish the model for violating it, or something like that?

I can’t just reject the actions - when it produces them, they’re a high proportion of the actions in total. And clipping manually removes my chance to make use of confidence bands, since the majority of actions will just be pegged to high/low.

Hey @imnotpete ,

For ARS, the default config value forclip_actions is False.
For PPO, it is True.

You can pass clip_actions to compute_actions (have a look at the docstring!) to overwrite this behaviour if you want it to be true.

Cheers

@arturn

Ahhh I see that now, that you can pass those to Trainer.compute_actions(). I missed that as I’ve been using ARSTrainer.compute_single_action(), which itself calls the policy’s compute_actions(), not the parent Trainer.

The state format and so on appears to be different between the different calls, but testing with just reproducing the space_utils.unsquash_action() and space_utils.clip_action() does seem to do what is expected - no longer producing out-of-bounds actions. The question now is how to reproduce that behavior in training (and whether to modify my state to call the other compute method that includes that functionality).

Thanks for pointing me in the right direction!

1 Like