Behavioural Cloning Algo

@mannyv thanks for your anwer. Yeah the actions consists of two values. That’s correct.
The expert is actually another trained rllib td3 agent.
How would I get the actions_logp for each step in that case? Or does it not matter since you said it is actually not used for training the BC?

I am trying to do pretrain my agent somehow like here: