Nan in cql training from provided example

Hello,

I am testing CQL on RLlib with a provided example. However, when I run

rllib train -f tuned_examples/cql/halfcheetah-cql.yaml

the code runs but returns all nan number despite hours of training. The config in yaml looks reasonable to me, so I am not sure why.

Thanks.

Hey @gjoliver @sven1977 @avnishn any thoughts here?