Hello everybody,
I’ve set up a custom multiagent config where I have two policies sharing most parts of a NN model (just input/output layers are individual). There are two agents [1, 2] each using one of the policies [hoist1, hoist2]. After first training step (train_batch_size: 256
) I get the following error
AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {‘cur_kl_coeff’: 0.20000000298023224, ‘cur_lr’: 4.999999873689376e-05, ‘total_loss’: 0.07336341, ‘policy_loss’: -0.19851266, ‘vf_loss’: 0.25689963, ‘vf_explained_var’: -0.059160233, ‘kl’: 0.07488224, ‘entropy’: 1.2394269, ‘entropy_coeff’: 0.0}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist2’)
The learner_stats
of at least one policy (here: hoist2) are empty, here the complete stacktrace:
Traceback (most recent call last):
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 87, in run_code
exec(code, run_globals)
File "c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy_main.py", line 45, in
cli.main()
File “c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy/…\debugpy\server\cli.py”, line 444, in main
run()
File “c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy/…\debugpy\server\cli.py”, line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str(“main”))
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “c:\Users\user\Desktop\KI_Galv\galvcon\marl\main\policyTrainer.py”, line 274, in
result = trainer.train()
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 608, in train
raise e
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 594, in train
result = Trainable.train(self)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\tune\trainable.py”, line 232, in train
result = self.step()
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py”, line 173, in step
res = next(self.train_exec_impl)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 756, in next
return next(self.built_iterator)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 783, in apply_foreach
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 783, in apply_foreach
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 843, in apply_filter
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 843, in apply_filter
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 791, in apply_foreach
result = fn(item)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 206, in call
self.workers.local_worker().foreach_trainable_policy(update)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1051, in foreach_trainable_policy
return [
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1052, in
func(policy, pid, **kwargs)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 198, in update
assert kl is not None, (fetches, pi_id)
AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {‘cur_kl_coeff’: 0.20000000298023224, ‘cur_lr’: 4.999999873689376e-05, ‘total_loss’: 0.07336341, ‘policy_loss’: -0.19851266, ‘vf_loss’: 0.25689963, ‘vf_explained_var’: -0.059160233, ‘kl’: 0.07488224, ‘entropy’: 1.2394269, ‘entropy_coeff’: 0.0}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist2’)
So far, I haven’t found the problem causing this error. Any ideas what could cause this error?
Note: It can occur that learner_stats
of both policies or at least one policy are empty dicts!
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 198, in update
assert kl is not None, (fetches, pi_id)
AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist1’)