AssertionError in ppo.py: KL is None, learner stats of at least one policy are empty

Hello everybody,

I’ve set up a custom multiagent config where I have two policies sharing most parts of a NN model (just input/output layers are individual). There are two agents [1, 2] each using one of the policies [hoist1, hoist2]. After first training step (train_batch_size: 256) I get the following error

AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {‘cur_kl_coeff’: 0.20000000298023224, ‘cur_lr’: 4.999999873689376e-05, ‘total_loss’: 0.07336341, ‘policy_loss’: -0.19851266, ‘vf_loss’: 0.25689963, ‘vf_explained_var’: -0.059160233, ‘kl’: 0.07488224, ‘entropy’: 1.2394269, ‘entropy_coeff’: 0.0}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist2’)

The learner_stats of at least one policy (here: hoist2) are empty, here the complete stacktrace:

Traceback (most recent call last):
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 87, in run_code
exec(code, run_globals)
File "c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy_main
.py", line 45, in
cli.main()
File “c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy/…\debugpy\server\cli.py”, line 444, in main
run()
File “c:\Users\user.vscode\extensions\ms-python.python-2021.6.944021595\pythonFiles\lib\python\debugpy/…\debugpy\server\cli.py”, line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str(“main”))
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “c:\Users\user\Desktop\KI_Galv\galvcon\marl\main\policyTrainer.py”, line 274, in
result = trainer.train()
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 608, in train
raise e
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 594, in train
result = Trainable.train(self)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\tune\trainable.py”, line 232, in train
result = self.step()
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py”, line 173, in step
res = next(self.train_exec_impl)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 756, in next
return next(self.built_iterator)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 783, in apply_foreach
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 783, in apply_foreach
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 843, in apply_filter
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 843, in apply_filter
for item in it:
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\util\iter.py”, line 791, in apply_foreach
result = fn(item)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 206, in call
self.workers.local_worker().foreach_trainable_policy(update)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1051, in foreach_trainable_policy
return [
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1052, in
func(policy, pid, **kwargs)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 198, in update
assert kl is not None, (fetches, pi_id)
AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {‘cur_kl_coeff’: 0.20000000298023224, ‘cur_lr’: 4.999999873689376e-05, ‘total_loss’: 0.07336341, ‘policy_loss’: -0.19851266, ‘vf_loss’: 0.25689963, ‘vf_explained_var’: -0.059160233, ‘kl’: 0.07488224, ‘entropy’: 1.2394269, ‘entropy_coeff’: 0.0}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist2’)

So far, I haven’t found the problem causing this error. Any ideas what could cause this error?

Note: It can occur that learner_stats of both policies or at least one policy are empty dicts!

File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\rllib\agents\ppo\ppo.py”, line 198, in update
assert kl is not None, (fetches, pi_id)
AssertionError: (defaultdict(<class ‘dict’>, {‘hoist1’: {‘learner_stats’: {}}, ‘hoist2’: {‘learner_stats’: {}}}), ‘hoist1’)

Do you use a custom execution plan?
Is this ralated?

@arturn

I’ve found the problem causing this error. For purposes of testing I set config variable train_batch_size to 128 and leave PPO config variable sgd_minibatch_size as default (128). Thus, sample batches of each agent contain less samples than sgd_minibatch_size, i.e. <128 and in method do_minibatch_sgd (in sgd.py) no minibatches can be generated. Finally, this leads to empty learner_stats and KL is None.
A check for size of sample batch >= sgd_minibatch_size is to be recommended, e.g.

def minibatches(samples, sgd_minibatch_size, shuffle=True):
    """Return a generator yielding minibatches from a sample batch.

    Args:
        samples (SampleBatch): batch of samples to split up.
        sgd_minibatch_size (int): size of minibatches to return.

    Returns:
        generator that returns mini-SampleBatches of size sgd_minibatch_size.
    """
    if not sgd_minibatch_size:
        yield samples
        return

    if isinstance(samples, MultiAgentBatch):
        raise NotImplementedError(
            "Minibatching not implemented for multi-agent in simple mode")

    if "state_in_0" not in samples and "state_out_0" not in samples:
        samples.shuffle()

    assert samples.__len__() >= sgd_minibatch_size, \
        "Size of SampleBatch {} should be at least sgd_minibatch_size {}!" \
            .format(samples.__len__(), sgd_minibatch_size)
    
    all_slices = samples._get_slice_indices(sgd_minibatch_size)
    data_slices, state_slices = all_slices

    if len(state_slices) == 0:
        if shuffle:
            random.shuffle(data_slices)
        for i, j in data_slices:
            yield samples.slice(i, j)
    else:
        all_slices = list(zip(data_slices, state_slices))
        if shuffle:
            # Make sure to shuffle data and states while linked together.
            random.shuffle(all_slices)
        for (i, j), (si, sj) in all_slices:
            yield samples.slice(i, j, si, sj)

Hey @klausk55 , sounds good. Could you maybe create a PR for your suggested fix? Thanks :slight_smile:

@sven1977 here is the PR