Handling complex computations in Env

I developed a custom multi-agent hierarchical environment using the MultiAgentEnv class as a starting point.

It is based on a quite sizeable store of data(the state) incorporating different data formats. From there the agents pull their observations. According to the agents actions the data store/state has to be updated/modified with some demanding computations. These calculations also rely heavily on other python modules.

Has anyone a recommendation if this should be best used as external environment or should i put everything in the env class and let it be stepped by RLlib? I also wonder what’s the better option performance-wise. Does Ray accelerate only the learning part or also the env steps when using GPUs? Thanks in advance!

Hi @Blubberblub ,

if you have an external environment that is quite efficient in performing such operations my preferred way would be out (e.g. if you have a kind of distributed database that is running such operations in parallel or if you have some graphical simulation software that can calculate new coordinates in space or whatsoever).

In the other case I would take a look at the Trainer config values remote_worker_envs and remote_env_batch_wait_ms.

Hope this helps

Thanks @Lars_Simon_Zehnder for the advice!

1 Like

Ray only accelerates the training with the GPUs. You need to design GPU acceleration in your environment yourself if you want to speed up the step process. In addition to using an external environment, you might also consider using rllib’s client-server architecture to further speed up the data-generation process across multiple nodes.

1 Like

Thanks for the additional info @rusu24edward ! :slightly_smiling_face: i will have a look into this. There’s some raytracing calculations happening as part of my environment and im still looking how to implement this it in a better way. I started from simple gym examples and added custom modules that make use of numba for speeding up the computations. This worked for a basic implementation but now i’m looking into triggering these computations in between steps and therefore i need to optimize the environment even further. However these questions are not ML learning specific so i wont bother you here. :grinning_face_with_smiling_eyes: Thanks for all the suggestions!