Poor performance of offline algorithms tuned examples

Description

I’m testing the offline algorithms RLlib provided, especially those using D4RL dataset. However, I’ve found that nearly all tuned_examples provided in CQL algorithm perform really poor. For example:

  1. Hopper-bc.yaml

Halfcheetah-bc and Hopper-cql are of the similar results. Both reward and loss curve seems to be oscillating, which I think meaning the algorithm isn’t really learning anything. These examples didn’t come with a result, I wonder is this normal?

Environment

  • Ray 2.2.0 (Since this is the last version support Gym, and D4RL relies on it)
  • Python 3.10.13
  • Pytorch 2.2.1