Trouble implementing concurrent code alongside rllib environment

How severe does this issue affect your experience of using Ray?

  • High: It blocks me from completing my task.

The Problem Explanation -
I have a working reinforcement learning project in python where I run 3 methods with multiprocessing library.

  1. One process is for running Omnet++ simulation (which simulates 5g network) in a bash subprocess which is restarted whenever its duration ends. The omnet++ simulation requires to connect to an external websocket server (see second process explained below) to work.
  2. Another process is running an asynchronous websocket server which receives information from another websocket inside the simulation. This process accumulates data until a certain point and then produces something which we could call the observation for the RL agent. This is also the time when the action is supposed to be taken but I had to pass this observation into a queue t the rl environment for the reason specified below.
  3. The last process is training or testing a reinforcement learning agent from stable_baselines3 library. The action for the agent are essentially overwriting files that are frequently read by omnet++ simulation to make live adjustments.
    The reason for using different processes here was that the websocket server as well as the omnet++ simulation processes needed to block code execution (to run for long durations) and hence the agent’s training code was made to be independent of this code. To provide the agent with the information from the websocket server process, I am using the multiprocessing queue.

Now I am switching to the rllib because I want to move to multiple agents, not available in stable_baselines3 library. While researching how to build the environment was a relatively easy task, getting the simulation and websocket working with rllib has turned out to be a difficult task. I am open to completely restructuring the code but I have no clear idea how the whole system should be designed. Ultimately I want to have multiple PCs running such systems (Simulation-Websocket-MultiAgentRL) and the policies be trained from all those PCs. As such, I need your help in defining a clear structure for the solution that could accomodate the current implementations of simulation and the websocket code (note that additional code can be added to them as both of them have looping structure).

Things I have tried -

  1. Implementing an MultiAgentEnv environment
  2. Defining a ray node consisting of (@ray.remote) classes for each of the methods (was suggested in a post in this forum)
  3. Changing the queue to ray’s version
  4. Importing multiprocessing inside a ray actor and running the processes concurrently (was suggested in a post in this forum)
  5. Kafka queue
  6. Multiple combinations of the above mentioned

Things I haven’t implemented but I believe could be interesting to my problem -

  1. Defining the environment as ExternalMultiAgentEnv instead of MultiAgentEnv. While I understood the concept of having an external environment, the page for this topic doesn’t clearly specify how differently I have to train the agent from a generic case.
  2. Defining Policy Server and Client for Simulation-Websocket combo but I currently don’t know how this can be implemented since both the simulation and websocket methods are code blocking and cannot be changed much.
    If you believe any of these could actually solve my problem, please point me to an example code; it would be of great help.

I am new to ray but I have been constantly reading up documentation and forum posts for the last two weeks. Please let me know if you need more information or explanation. Thank you!