Behavior Cloning vs Monotonic Advantage Re-Weighted Imitation Learning

Alexander_Alexandrov · November 8, 2022, 7:33am

None

None: Just asking a question out of curiosity
Low: It annoys or frustrates me for a moment.
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.

I want to use RL to learn agents in 3D games (Unity) to act. Right now im reading documentation to get high-level understanding of things i should now and steps shall be done.

Right now i do not quite understand what beta coeff does. As per documentation if it zero, then algo will not care at all about rewards. Just simple comparison (aka Importance Sampling). If it is one, then it turns into MARWIN and Direct Method with Q-Learning algos comes into play.

If i got it right BC is nothing more but Critic in Actor Critic Algo-like Methods?

If so i do not quite understand what beta coeff does. What if its say 0.75 or 0.25 what diff will it make?

Will be really appreciated for any help and/or guidence, because it does not seem i will ever get it alone

p.s. Links:
Algos description from ray/rllib docs
Working With Offline Data

Topic		Replies	Views
Scaling rewards depending on action distribution RLlib	2	359	November 3, 2021
Behavior Cloning through custom env RLlib	4	498	August 13, 2021
RLlib experiments Configure Algorithm, Training, Evaluation, Scaling	0	227	October 22, 2023
Inverse reinforcement learning algorithms RLlib	4	513	January 23, 2025
Dealing with very imbalanced rewards in most of state space RLlib	2	308	October 3, 2022

Behavior Cloning vs Monotonic Advantage Re-Weighted Imitation Learning

Related topics