Ray tune with multi-agent APPO

prs · April 18, 2024, 4:37pm

Hi everyone,

I defined a custom environment and trained a multi-agent PPO with ray.Tune. It trained okay without any errors. However, when I change PPO to APPO and IMPALA (both config and in tune trainable), I get this error below after training for one or two episodes.

ERROR tune_controller.py:1374 -- Trial task failed for trial APPO_multiAgent_env_77c02_00000
Traceback (most recent call last):
  File "/python3.8/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "python3.8/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "python3.8/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "python3.8/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::APPO.train() (pid=11259, ip=127.0.0.1, actor_id=fe74a48b015ee2ab21d5abb001000000, repr=APPO)
  File "python3.8/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "python3.8/site-packages/ray/tune/trainable/trainable.py", line 339, in train
    result = self.step()
  File "python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 852, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 3042, in _run_one_training_iteration
    results = self.training_step()
  File "python3.8/site-packages/ray/rllib/algorithms/appo/appo.py", line 363, in training_step
    train_results = super().training_step()
  File "python3.8/site-packages/ray/rllib/algorithms/impala/impala.py", line 735, in training_step
    train_results = self.learn_on_processed_samples()
  File "python3.8/site-packages/ray/rllib/algorithms/impala/impala.py", line 953, in learn_on_processed_samples
    result = self.learner_group.update(
  File "python3.8/site-packages/ray/rllib/core/learner/learner_group.py", line 186, in update
    self._learner.update(
  File "python3.8/site-packages/ray/rllib/core/learner/learner.py", line 1303, in update
    ) = self._update(nested_tensor_minibatch)
  File "python3.8/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 365, in _update
    return self._possibly_compiled_update(batch)
  File "python3.8/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 123, in _uncompiled_update
    loss_per_module = self.compute_loss(fwd_out=fwd_out, batch=batch)
  File "python3.8/site-packages/ray/rllib/core/learner/learner.py", line 1023, in compute_loss
    loss = self.compute_loss_for_module(
  File "python3.8/site-packages/ray/rllib/algorithms/appo/torch/appo_torch_learner.py", line 62, in compute_loss_for_module
    behaviour_actions_logp_time_major = make_time_major(
  File "python3.8/site-packages/ray/rllib/algorithms/impala/torch/vtrace_torch_v2.py", line 48, in make_time_major
    ***rs = torch.reshape(tensor, [B, T] + list(tensor.shape[1:]))***
***RuntimeError: shape '[9, 50]' is invalid for input of size 499***

This is the code I’m running:


config = (
        APPOConfig()
        .environment("my_env")
        .experimental(
            _enable_new_api_stack=True
        )
        .rollouts(num_rollout_workers=0, enable_connectors=True)
        .framework("torch")
        .rl_module(
            rl_module_spec=MultiAgentRLModuleSpec(
                module_specs={p: SingleAgentRLModuleSpec() for p in policies},
            ),
        )
        .multi_agent(
            policies=policies,
            policy_mapping_fn=policy_mapping_fn,
        )
    )

results = tune.Tuner(
    "APPO",
    param_space=config.to_dict(),
    run_config=air.RunConfig(stop={"training_iteration": 15}, verbose=1),
).fit()

I have no idea why this happens and any help would be valued.

P.S. I also tried SAC, which gives get_default_rl_module_spec NotImplementedError error in get_marl_module_spec.

Lars_Simon_Zehnder · May 3, 2024, 1:44pm

@prs Great catch! Since some weeks we are rewriting the APPO for the new stack to improve performance when scaling out.

I would kindly propose to keep for now in the old stack until the new APPO is released.

Sheshan_Kutty · February 26, 2025, 8:25pm

Any examples or document on converting PPO to APPO? I want to use train() method.

Lars_Simon_Zehnder · February 27, 2025, 10:20am

Hi @Sheshan_Kutty , great question! Yes there are, we tuned both algorithms to play Atari Pong. From there you can see the differences in hyperparameters:

PPO: ray/rllib/tuned_examples/ppo/atari_ppo.py at master · ray-project/ray · GitHub
APPO: ray/rllib/tuned_examples/appo/pong_appo.py at master · ray-project/ray · GitHub

We also made many performance improvements on APPO in the last weeks. This should scale quite nicely.

Sheshan_Kutty · February 27, 2025, 8:52pm

Thank you, how about, (Parameter Server, Experience Sharing)
should still be supported? Any implementation adjustment details to integrate with these newer versions.

When you said, keep in the old stack, what version of rllib you were talking about?

Topic		Replies	Views
Multi-agent APPO with variable agent numbers and horizon RLlib	0	298	April 4, 2022
Mismatch between the results of PPO after upgrading to Ray 1.8.0 RLlib	2	331	December 15, 2021
Rewards leaks to different multi agent policies in training only Configure Algorithm, Training, Evaluation, Scaling	3	164	May 31, 2024
Ray.rllib.agents.ppo missing RLlib	3	7624	March 27, 2023
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm) RLlib	4	2898	August 8, 2022

Ray tune with multi-agent APPO

Related topics