[RLlib] How does setting `training_enabled` to False in external_env._ExternalEnvEpisode trigger no training?

Hi Ray Team,

My team is currently exploring the feasibility to use policy serving pattern to conduct large scale inference. And we noticed the parameter training_enabled in external_env._ExternalEnvEpisode() may be a hint for us to achieve what we want by reverse engineering inference from training. But by looking at the API we can’t trace howtraining_enabled is being used during training to determine if policy update is disabled. We have used keyword searching for the entire Ray repo, but still didn’t have any clue.

Do you mind to share some lights on where and how does training_enabled being used? Some insights for performing large scale inference will be helpful as well.

Thank you,
Heng

Actually, you are right. I think setting this does nothing, except publishing this information in the “infos” dict returned by the step method on the server side. So the server then still has to respect that information, which afaik it doesn’t do (it simply ignores it).

This will require a fix on the PolicyServer side.

Thanks for raising this @heng2j ! :slight_smile:

No problem @sven1977. Glad that we were able to raise the attention to the team. Will raise an official Github issue to the repo.

HI @sven1977, here is the bug report for this issue

1 Like