How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi all. I have been trying out RL problems with the help of sb3. Given the interesting opportunities, ray has to offer I want to try it for a PPO problem. But I find it very complex:
My env has Dict observations, Discrete actions. I use a standardization tool for each key of the dict provided by sb3: VecNormalization wrapper. As a policy, I have a custom feature extractor for each key on the dict. Then I concat the result and feed a separated value and policy functions. (Can’t wait to take advantage of Attention and other resources built into Ray!).
How can I switch to ray? I can see that the way I work with sb3 is very different from what ray examples and doc show.