RLLib external environment set up for turn based game

Yep! I think so for the most part - you can read more about it here: ray/doc/source/rllib/external-envs.rst at releases/2.47.1 · ray-project/ray · GitHub

Regarding weight versioning and off-policy data: RLlib’s external environment setup (via RLlink) supports both on-policy and off-policy data collection. RLlib can train on these off-policy samples, though on-policy algorithms like PPO may see some degradation if the lag is large.

For high scalability (100+ games), batching trajectories and occasionally updating weights on the client is standard practice, and RLlib is designed to handle such asynchronous, parallel data ingestion (this discussion might be helpful even if it is a bit old).