CQL Target Q Evaluation Entropy term?

Hi there,

I am going over the code of SAC and CQL implementation in RLlib, and noticed a difference in the target Q evaluation part. In SAC, the target Q function will subtract the entropy term which I think is correct. However, in CQL which the Bellman update part is based on SAC, but there is no entropy subtraction. Is there any particular reason for this?


Hi @captainzhao,

Check out this comment by @michaelzhiluo in an old issue.