How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am working with FinRL-Meta (link in reproduction project). I wanted to try using RLlib’s ARS implementation with the same codebase, but the ARS model (using compute_single_action()
) is producing actions outside my environment’s defined action_space
.
I have a reproduction project here, in a notebook. It’s based on my latest pull of FinRL-Meta, without any future changes.
In this case, the environment defines action_space
:
self.action_space = spaces.Box(low=0,
high=3,
shape=(len(self.assets),)) # len(self.assets) always equals 1 currently
Yet (usually) this code is producing actions below 0, in the 0 to -1 range
Other code built on this (modified environment, etc) where I have a -3 to 3 Box
action space, sometimes sees actions as far out-of-bounds as ± 60.
Am I misunderstanding how action_space
works? Shouldn’t the model be normalizing/squashing/clipping its actions into that space before returning them to me? I would like to use the action as a confidence indicator, but when most of the results are entirely out of the space, I’m not sure how - doesn’t matter if the action is betwen 0-1, 1-2, 2-3, when many of them are 10 or 20 or 30.
What I have tried
Changing space type
I have tried switching to a Discrete
action space, and switching the Box
from float32
to int32
. In both of those cases, the OOB issue goes away (computed actions are an integer within range, as expected), but I lose all reproducibility in my tests. The tests no longer produce the same results on the same input data, despite explore=False
, which makes it difficult to compare trained models.
Upgrading Ray/gym
I tried upgrading Ray to 2.0.0 (and gym to 0.24) but this had no effect on the OOB issue.
unsquash_actions=True
I found this post and tried it on my compute_single_action()
call, with no success.
Edited to add
Versions:
ray = 1.12.0
gym = 0.21.0
python = 3.7