Poor performance of offline algorithms tuned examples

Morphlng · March 1, 2024, 7:04am

Description

I’m testing the offline algorithms RLlib provided, especially those using D4RL dataset. However, I’ve found that nearly all tuned_examples provided in CQL algorithm perform really poor. For example:

Hopper-bc.yaml

image1920×840 113 KB

Halfcheetah-bc and Hopper-cql are of the similar results. Both reward and loss curve seems to be oscillating, which I think meaning the algorithm isn’t really learning anything. These examples didn’t come with a result, I wonder is this normal?

Environment

Ray 2.2.0 (Since this is the last version support Gym, and D4RL relies on it)
Python 3.10.13
Pytorch 2.2.1

Topic		Replies	Views
Offline RL with DQN, PPO, etc Offline RL	0	322	November 5, 2023
Offline data example Offline RL	4	667	April 14, 2023
Roll out CQL policy RLlib	8	644	November 25, 2021
RLLIB not working with Tune with sample batch input RLlib	25	2606	October 4, 2022
Off policy algorithms start doing the same action RLlib	9	426	December 31, 2022

Poor performance of offline algorithms tuned examples

Description

Environment

Related topics