Step by step way to interact with an environment and update an agent

I am trying to train two separate agents in two different environments that are to train in the same loop. The output from one env.step is to be fed to the next environment before I call the compute_single_action on the next environment. Currently, in Rllib, everything seems to be encapsulated behind a .train() method with very little opportunity for customization during training.

The crux of the problem is that I cannot find a good example in the current RLLib version that shows how to perform the following steps explicitly.

  1. Set up an environment
  2. Set up the RLLib PPO agent
  3. In a for loop with iteration budget:
  • choose an action based on the current state of the environment
  • collect them in some form of a RLLib Buffer class if one exists
  • after some periodic steps of budget, perform the PPO agent update with the Buffer
  1. Evaluate it at regular intervals.

Sorry if I don’t understand your question correctly- does the hierarchical training example help for your training workflow? This involves one high-level agent computing actions that are then passed to low-level agents ray/hierarchical_training.py at master · ray-project/ray · GitHub

Or ray/centralized_critic.py at master · ray-project/ray · GitHub for sharing observations between agents.

To implement a custom training workflow, you’ll have to do it by overriding the algorithm’s training_step method: ray/algorithm.py at master · ray-project/ray · GitHub