SVL has recently launched a new challenge for embodied, multi-task learning in home environments called BEHAVIOR, as part of this we are recommending users start with ray or stable-baselines3 to get quickly spun up and to support scalable, multi-environment training.
We shipped a ray example, but I’ve had trouble replicating the PPO performance on a point navigation task in our environment. I went through and tried to match all settings and the model architecture from stable-baselines3, but I’ve been unable to replicate the results of stable-baselines3 in Ray. I was hoping I was doing something obviously wrong.
Here is the example repo: I’ve dockerized everything to make the results as reproducible as possible:
The one snag is, we have to distributed the models with a license agreement/encrypted. The instructions are in the readme in that repo, and it shouldn’t take more than a couple minutes for you to get approved. Please let me know if you have any questions, or if anything doesn’t work with the example. Note for ray, you may have to lower or raise the allocated CPU for your train workers.