Ray tune with multi-agent APPO

Hi everyone,

I defined a custom environment and trained a multi-agent PPO with ray.Tune. It trained okay without any errors. However, when I change PPO to APPO and IMPALA (both config and in tune trainable), I get this error below after training for one or two episodes.

ERROR tune_controller.py:1374 -- Trial task failed for trial APPO_multiAgent_env_77c02_00000
Traceback (most recent call last):
  File "/python3.8/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "python3.8/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "python3.8/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "python3.8/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::APPO.train() (pid=11259, ip=127.0.0.1, actor_id=fe74a48b015ee2ab21d5abb001000000, repr=APPO)
  File "python3.8/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "python3.8/site-packages/ray/tune/trainable/trainable.py", line 339, in train
    result = self.step()
  File "python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 852, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 3042, in _run_one_training_iteration
    results = self.training_step()
  File "python3.8/site-packages/ray/rllib/algorithms/appo/appo.py", line 363, in training_step
    train_results = super().training_step()
  File "python3.8/site-packages/ray/rllib/algorithms/impala/impala.py", line 735, in training_step
    train_results = self.learn_on_processed_samples()
  File "python3.8/site-packages/ray/rllib/algorithms/impala/impala.py", line 953, in learn_on_processed_samples
    result = self.learner_group.update(
  File "python3.8/site-packages/ray/rllib/core/learner/learner_group.py", line 186, in update
    self._learner.update(
  File "python3.8/site-packages/ray/rllib/core/learner/learner.py", line 1303, in update
    ) = self._update(nested_tensor_minibatch)
  File "python3.8/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 365, in _update
    return self._possibly_compiled_update(batch)
  File "python3.8/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 123, in _uncompiled_update
    loss_per_module = self.compute_loss(fwd_out=fwd_out, batch=batch)
  File "python3.8/site-packages/ray/rllib/core/learner/learner.py", line 1023, in compute_loss
    loss = self.compute_loss_for_module(
  File "python3.8/site-packages/ray/rllib/algorithms/appo/torch/appo_torch_learner.py", line 62, in compute_loss_for_module
    behaviour_actions_logp_time_major = make_time_major(
  File "python3.8/site-packages/ray/rllib/algorithms/impala/torch/vtrace_torch_v2.py", line 48, in make_time_major
    ***rs = torch.reshape(tensor, [B, T] + list(tensor.shape[1:]))***
***RuntimeError: shape '[9, 50]' is invalid for input of size 499***

This is the code I’m running:


config = (
        APPOConfig()
        .environment("my_env")
        .experimental(
            _enable_new_api_stack=True
        )
        .rollouts(num_rollout_workers=0, enable_connectors=True)
        .framework("torch")
        .rl_module(
            rl_module_spec=MultiAgentRLModuleSpec(
                module_specs={p: SingleAgentRLModuleSpec() for p in policies},
            ),
        )
        .multi_agent(
            policies=policies,
            policy_mapping_fn=policy_mapping_fn,
        )
    )

results = tune.Tuner(
    "APPO",
    param_space=config.to_dict(),
    run_config=air.RunConfig(stop={"training_iteration": 15}, verbose=1),
).fit()

I have no idea why this happens and any help would be valued.

P.S. I also tried SAC, which gives get_default_rl_module_spec NotImplementedError error in get_marl_module_spec.

2 Likes

@prs Great catch! Since some weeks we are rewriting the APPO for the new stack to improve performance when scaling out.

I would kindly propose to keep for now in the old stack until the new APPO is released.

1 Like

Any examples or document on converting PPO to APPO? I want to use train() method.

Hi @Sheshan_Kutty , great question! Yes there are, we tuned both algorithms to play Atari Pong. From there you can see the differences in hyperparameters:

We also made many performance improvements on APPO in the last weeks. This should scale quite nicely.

Thank you, how about, (Parameter Server, Experience Sharing)
should still be supported? Any implementation adjustment details to integrate with these newer versions.

When you said, keep in the old stack, what version of rllib you were talking about?