Hi Ray Team,
My team is currently exploring the feasibility to use policy serving pattern to conduct large scale inference. And we noticed the parameter training_enabled
in external_env._ExternalEnvEpisode()
may be a hint for us to achieve what we want by reverse engineering inference from training. But by looking at the API we can’t trace howtraining_enabled
is being used during training to determine if policy update is disabled. We have used keyword searching for the entire Ray repo, but still didn’t have any clue.
Do you mind to share some lights on where and how does training_enabled
being used? Some insights for performing large scale inference will be helpful as well.
Thank you,
Heng