How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am working with FinRL-Meta (link in reproduction project). I wanted to try using RLlib’s ARS implementation with the same codebase, but the ARS model (using
compute_single_action()) is producing actions outside my environment’s defined
I have a reproduction project here, in a notebook. It’s based on my latest pull of FinRL-Meta, without any future changes.
In this case, the environment defines
self.action_space = spaces.Box(low=0, high=3, shape=(len(self.assets),)) # len(self.assets) always equals 1 currently
Yet (usually) this code is producing actions below 0, in the 0 to -1 range
Other code built on this (modified environment, etc) where I have a -3 to 3
Box action space, sometimes sees actions as far out-of-bounds as ± 60.
Am I misunderstanding how
action_space works? Shouldn’t the model be normalizing/squashing/clipping its actions into that space before returning them to me? I would like to use the action as a confidence indicator, but when most of the results are entirely out of the space, I’m not sure how - doesn’t matter if the action is betwen 0-1, 1-2, 2-3, when many of them are 10 or 20 or 30.
I have tried switching to a
Discrete action space, and switching the
int32 . In both of those cases, the OOB issue goes away (computed actions are an integer within range, as expected), but I lose all reproducibility in my tests. The tests no longer produce the same results on the same input data, despite
explore=False, which makes it difficult to compare trained models.
I tried upgrading Ray to 2.0.0 (and gym to 0.24) but this had no effect on the OOB issue.
I found this post and tried it on my
compute_single_action() call, with no success.
Edited to add
ray = 1.12.0
gym = 0.21.0
python = 3.7