I am trying to train a huge number of agents in the Battle environment in a decentralized manner. The experiments are extremely slow … from what I read in the docs:
- If the environment is slow and cannot be replicated (e.g., since it requires interaction with physical systems), then you should use a sample-efficient off-policy algorithm such as DQN or SAC. These algorithms default to
num_workers: 0for single-process operation. Make sure to set
num_gpus: 1if you want to use a GPU. Consider also batch RL training with the offline data API.
- If the environment is fast and the model is small (most models for RL are), use time-efficient algorithms such as PPO, IMPALA, or APEX. These can be scaled by increasing
num_workersto add rollout workers. It may also make sense to enable vectorization for inference. Make sure to set
num_gpus: 1if you want to use a GPU. If the learner becomes a bottleneck, multiple GPUs can be used for learning by setting
num_gpus > 1.
I am not sure which one of these categories my experiments fall into … There are 120 agents in the environment and increasing the number of rollout workers makes the matter worse since a huge amount of memory would be required to hold all the policies and offloading to storage would be required … (RLLib automatically does that by creating 120*num_workers policy files in the location specified by policy_map_cache) … On the other hand, setting num_worker to 0 and using DQN does not help either, experiments often get stuck.
I am wondering if there are any ways to optimize the runs (the amount of memory needed) … I have also played around with “num_sgd_iter”, “rollout_fragment_length”, “train_batch_size”, “sgd_minibatch_size” but I’m not sure if they had any effect. Using the parallelized version of the environment did not help either … With AEC version 2 iteration took around 30 minutes for 120 agents with 180GB memory.