Description
I’m testing the offline algorithms RLlib provided, especially those using D4RL dataset. However, I’ve found that nearly all tuned_examples provided in CQL algorithm perform really poor. For example:
- Hopper-bc.yaml
Halfcheetah-bc and Hopper-cql are of the similar results. Both reward and loss curve seems to be oscillating, which I think meaning the algorithm isn’t really learning anything. These examples didn’t come with a result, I wonder is this normal?
Environment
- Ray 2.2.0 (Since this is the last version support Gym, and D4RL relies on it)
- Python 3.10.13
- Pytorch 2.2.1