ExternalMultiAgentEnv dynamics

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello, I’m currently using RLlib to manage a hierarchical system of agents. In this step I need to collect observation from an external source and use this data in the training.
I’ve implemented a custom ExternalMultiAgentEnv, with all the necessary functions like run(), get_action(), start_episode(), end_episode() and log_returns().
In particular, I interact with the environmet trough an external script. In a call of the script I need to start the episode, get the action, apply it and in the following call I can compute the reward so log the returns and end the episode.

In the following scheme, let a for iteration be a call of the external script. This shows what I need to implement.

for i in range( episode_number):
    ... receive new data (needed to compute the reward)
    if i > 0:
         rewards = env.compute_cost(env._data)
         env.log_returns(env.episode_id,rewards,env._get_info())
    
         # end episode
         env.end_episode(env.episode_id, env._state)
        

    env.start_episode()
    env.get_action()
    ... apply the action outside the script
      

I tried to use trainer.train() as the documentations suggest, but doing like this I cannot separate the get_action from the log_returns in two successive calls. In fact it requires a full iteration to be completed inside a call.

I need to know how to implement this and still train the system of agents.