I’m running experiments with multiagent and large amounts of policies (currently 10 - 100).
I noticed when I have many policies, e.g. 30, initialization of my trainable is very slow (more than 5 minutes). I am running this on what I think is good hardware (128 cores, 2.4 GHz, I set num_workers to 50 but I’m not sure if this has an effect on init)
Are there any obvious pitfalls that I am missing?
Is this expected? I’m surprised having many policies has such a large effect on initialization, I assumed it would amount to simply drawing more random weights.
Is there any way to speed this up?
I have created this very minimal example which illustrates the problem on both DQN and PPO.
import time
import gym
import ray
from ray.rllib.agents.dqn import DQNTrainer
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.examples.env.multi_agent import MultiAgentCartPole
from ray.tune import register_env
ray.init()
num_policies = 30
# Simple environment with 4 independent cartpole entities
register_env("multi_agent_cartpole",
lambda _: MultiAgentCartPole({"num_agents": 30}))
single_dummy_env = gym.make("CartPole-v0")
obs_space = single_dummy_env.observation_space
act_space = single_dummy_env.action_space
policies = {str(i): (None, obs_space, act_space, {}) for i in range(num_policies)}
policy_mapping_fn = str
start = time.time()
ppo_trainer = PPOTrainer(
env="multi_agent_cartpole",
config={
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": None, # All
}
})
print(f"PPO init took {time.time() - start} seconds")
start = time.time()
dqn_trainer = DQNTrainer(
env="multi_agent_cartpole",
config={
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": None,
}
})
print(f"DQN init took {time.time() - start} seconds")
Here is part of an example output:
2021-09-06 09:11:46,320 INFO trainable.py:109 -- Trainable.setup took 476.444 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2021-09-06 09:11:46,321 WARNING util.py:55 -- Install gputil for GPU system monitoring.
PPO init took 476.4660232067108 seconds
2021-09-06 09:11:46,652 WARNING deprecation.py:34 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
2021-09-06 09:23:35,170 INFO trainable.py:109 -- Trainable.setup took 708.833 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2021-09-06 09:23:35,171 WARNING util.py:55 -- Install gputil for GPU system monitoring.
DQN init took 708.8500621318817 seconds
Hey @PavelC , great question and interesting find. I can’t really reproduce these extreme numbers. On my Mac, it takes less than a minute per Trainer (PPO and DQN) to create the 30 policies:
PPO init took 96.228423833847046 seconds
DQN init took 125.47070574760437 seconds
But this is with your above example and 10 workers (not 50!). But still, significantly less than 5min. Not sure, what’s going on. You have to understand that RLlib creates a separate graph and session for each policy in the tf + multi-agent case such that policies can be added (and removed) on-the-fly.
I’ve experienced slow initialization primarily as a function of the number of workers, not so much the number of policies. @PavelC have you tried setting up with less workers?
Sorry for the late reply, I had some work to do with lower amounts of policies.
Thank you for the responses @sven1977 and @rusu24edward , I think I see now that it is not unreasonable for initialization time to increase about linearly with more policies and workers if this affects the number of separate graphs that are created.
Now I ran some experiments:
num_policies
num_workers
init time seconds
machine
10
10
80
a
10
50
98
a
30
10
688
a
30
50
878
a
30
10
43
b
30
50
58
b
As you can see, even if I reduce the number of workers I get huge init times. When I set the num_policies low then a large number of workers does not seem to affect the init time much.
So, as you can see, the third line should be the same as you ran on your Mac, but I get way larger init times. The machine I tried has more than 100 cores, which I assume your Mac doesn’t have.
Now I ran the same again on a different machine that has fewer but more powerful CPUs und got the results marked as machine b in the table. Init times are very reasonable here.
My guess would be that the problem is that initialization is actually not done in parallel, so a machine with fewer stronger cores will take less time to initialize than a machine which has hundreds of free cores that are relatively old.
These new experiments were run with ray 1.7
In case someone wants to run this again, I made slight changes to the original test script: slow_rllib_init.py
Ok, even with the stronger CPU at a certain threshold initialization takes much longer again:
num_policies
num_workers
seconds (tf)
seconds (tfe)
30
10
43
50
10
71
100
10
139
110
10
641
47
125
10
791
150
10
999
200
10
1385
99
The important part is the large jump in initialization time from 100 to 110 policies. Not sure what’s going on there. I actually see more than 1 CPU being used. Could this be a cache problem, where at a certain size things don’t fit into the cache anymore and main memory has to be used?
As you can also see from the table, using tfe seems to solve the problem, which makes sense.
Ok, sorry for all the spam, but I found another way to circumvent this problem while still using framework: tf as opposed to tfe
When we init the policies sequentialy instead of all at once, this problem seems to at least be reduced.
I.e. I did this:
for i in range(1, num_policies):
_ = ppo_trainer.add_policy(
policy_id=str(i),
policy_cls=type(ppo_trainer.get_policy('0'))
)
Instead of adding the number of policies to the config from the start.
(see also the updated gist)
I have found the problem why increasing the number of policies beyond 100 causes a slowdown in initialization (as well as a slowdown of 10x when actually running the training):
Be sure to set the config parameter
"multiagent": {
'policy_map_capacity': 100
}
to an approrpiate value, i.e. higher than the number of policies, if these policies are all frequently used. The default value is 100, which caused problems in my case.