ARS produces actions outside of `action_space` bounds

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am working with FinRL-Meta (link in reproduction project). I wanted to try using RLlib’s ARS implementation with the same codebase, but the ARS model (using compute_single_action()) is producing actions outside my environment’s defined action_space.

I have a reproduction project here, in a notebook. It’s based on my latest pull of FinRL-Meta, without any future changes.

In this case, the environment defines action_space:

self.action_space = spaces.Box(low=0,
                               shape=(len(self.assets),)) # len(self.assets) always equals 1 currently

Yet (usually) this code is producing actions below 0, in the 0 to -1 range

Other code built on this (modified environment, etc) where I have a -3 to 3 Box action space, sometimes sees actions as far out-of-bounds as ± 60.

Am I misunderstanding how action_space works? Shouldn’t the model be normalizing/squashing/clipping its actions into that space before returning them to me? I would like to use the action as a confidence indicator, but when most of the results are entirely out of the space, I’m not sure how - doesn’t matter if the action is betwen 0-1, 1-2, 2-3, when many of them are 10 or 20 or 30.

What I have tried

Changing space type

I have tried switching to a Discrete action space, and switching the Box from float32 to int32 . In both of those cases, the OOB issue goes away (computed actions are an integer within range, as expected), but I lose all reproducibility in my tests. The tests no longer produce the same results on the same input data, despite explore=False, which makes it difficult to compare trained models.

Upgrading Ray/gym

I tried upgrading Ray to 2.0.0 (and gym to 0.24) but this had no effect on the OOB issue.


I found this post and tried it on my compute_single_action() call, with no success.

Edited to add
ray = 1.12.0
gym = 0.21.0
python = 3.7

I also tried with PPO, which did respect the action_space high/low and produced repeatable results, but didn’t perform as well as ARS.

Is there a setting I haven’t found, which would force ARS to respect the action space? Or should I be updating rewards to punish the model for violating it, or something like that?

I can’t just reject the actions - when it produces them, they’re a high proportion of the actions in total. And clipping manually removes my chance to make use of confidence bands, since the majority of actions will just be pegged to high/low.

Hey @imnotpete ,

For ARS, the default config value forclip_actions is False.
For PPO, it is True.

You can pass clip_actions to compute_actions (have a look at the docstring!) to overwrite this behaviour if you want it to be true.



Ahhh I see that now, that you can pass those to Trainer.compute_actions(). I missed that as I’ve been using ARSTrainer.compute_single_action(), which itself calls the policy’s compute_actions(), not the parent Trainer.

The state format and so on appears to be different between the different calls, but testing with just reproducing the space_utils.unsquash_action() and space_utils.clip_action() does seem to do what is expected - no longer producing out-of-bounds actions. The question now is how to reproduce that behavior in training (and whether to modify my state to call the other compute method that includes that functionality).

Thanks for pointing me in the right direction!

1 Like


A followup question: What is the right way to handle it at training time? I’m not manually calling any of the compute*() methods directly, just using the RLlib train() infrastructure.

Is there a way to make the training process respect the action space? or should I be throwing out any out-of-bounds actions during training, or manually clipping?

While training, rllib will default to exploration. When action clipping is activated, actions are also clipped to the actions space! :slight_smile:

I set 'clip_actions': True in my params for ARSTrainer.train(), and it still produced actions outside the space.

Maybe this is the wrong question.

What is the right way to handle the out-of-bounds actions in a training environment? In my case, I’m dividing the action space into bands. I have -3 to 3 action space, with -3 to -2 being one action, -2 to -1 another, and so on (buy/sell on the extremes and a couple types of hold/close between). I have been manually clipping out-of-bounds actions myself, which makes all OOB actions effectively live in those two outer bands. Given that upwards of 80-90% of generated actions are out-of-bounds (some of them as high as 60 or more, on my ±3 action space), this puts the great majority of generated actions in those bands.

Should I be rejecting OOB actions somehow? Or penalizing them in the rewards?

This might be a bug. Can you confirm this is related to [RLlib] ARS not respecting gym Box bounds in training or testing · Issue #29259 · ray-project/ray · GitHub

Assuming you’re talking about this GitHub issue: [RLlib] ARS not respecting gym Box bounds in training or testing · Issue #29259 · ray-project/ray · GitHub

Yes, they are related.

Yes, indeed. Got the wrong link. Thanks!

1 Like