Eager, lazy policies and TF2 training

I am trying to figure if i can run rllib with framework=tf2 config with lazy evaluation of the policies. Can I?

So far i have a set of observations w.r.t. policy kinds:

  • framework=tf2 mode is not default.
  • tf2 mode runs “eager” policies, as created by “as_eager” api, in eager (therefore slower) mode. It does not seem to create concrete functions anywhere or take care to prevent tracing churn on the same policy.
  • lazy policies do not work in tf2 mode at all because of compatibility issues.

Any suggestions if there’s any way (or oncoming direction) of training with concrete (lazy) functions in TF2 mode? If there’s a way, how can i do that?

Because right now it seems there’s no point to train with TF2 native at all. The most efficient policies are still the compatibility mode ones.

Thank you.

it seems i missed that there’s eager_tracing config parameter that seems to enable tf.funciton decoration of all eager policy methods. This seems to guarantee lazy execution of eager policies in framework=tfe mode. Am i right? is that the recommended way to work with TF2 training?

Also: how well does it cover all existing algorithms, and was it benchmarked against legacy TF1 compatibility policies? In other words, how safe you guys feel it is to switch to this mode (framework=tfe, eager_tracing=True) from the legacy TF1 policies in rllib 1.3? Thank you.