Hello,
I am trying to figure if i can run rllib with framework=tf2 config with lazy evaluation of the policies. Can I?
So far i have a set of observations w.r.t. policy kinds:
- framework=tf2 mode is not default.
- tf2 mode runs “eager” policies, as created by “as_eager” api, in eager (therefore slower) mode. It does not seem to create concrete functions anywhere or take care to prevent tracing churn on the same policy.
- lazy policies do not work in tf2 mode at all because of compatibility issues.
Any suggestions if there’s any way (or oncoming direction) of training with concrete (lazy) functions in TF2 mode? If there’s a way, how can i do that?
Because right now it seems there’s no point to train with TF2 native at all. The most efficient policies are still the compatibility mode ones.
Thank you.