Behavior Cloning vs Monotonic Advantage Re-Weighted Imitation Learning

None

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

I want to use RL to learn agents in 3D games (Unity) to act. Right now im reading documentation to get high-level understanding of things i should now and steps shall be done.

Right now i do not quite understand what beta coeff does. As per documentation if it zero, then algo will not care at all about rewards. Just simple comparison (aka Importance Sampling). If it is one, then it turns into MARWIN and Direct Method with Q-Learning algos comes into play.

If i got it right BC is nothing more but Critic in Actor Critic Algo-like Methods?

If so i do not quite understand what beta coeff does. What if its say 0.75 or 0.25 what diff will it make?

Will be really appreciated for any help and/or guidence, because it does not seem i will ever get it alone :wink:

p.s. Links:
Algos description from ray/rllib docs
Working With Offline Data