Proper way of setting up a turn-based action-masked multiagent PPO

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have been able to use the example that describe how to run a masked PPO agent. ray/rllib/examples/rl_module/action_masking_rlm.py at 9693fa855f9a9c2a738b2f26b294eb17282f43df · ray-project/ray · GitHub

and i used ray/rllib/examples/multi_agent_and_self_play/self_play_league_based_with_open_spiel.py at master · ray-project/ray · GitHub
to create a multiagent environment.

I have encountered the following issue:.
i have not been able to replicate the turn based nature of the open spiel example. In that example is possible to return a observation dictionary that contains the key of the current player only. If i attempt to do the same, i get a crash when episodes are fetched from the MultiAgentEpisode datastructure. It seems to me that it is because it assumes every player takes actions simultaneusly.

  File "/home/massimo/Documents/ray/example.py", line 481, in <module>
    model.train()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 334, in train
    raise skipped from exception_cause(skipped)
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 331, in train
    result = self.step()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 849, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3194, in _run_one_training_iteration
    results = self.training_step()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 406, in training_step
    return self._training_step_new_api_stack()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 418, in _training_step_new_api_stack
    episodes = synchronous_parallel_sample(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 88, in synchronous_parallel_sample
    sampled_data = worker_set.foreach_worker(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 771, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 78, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(IndexError): ray::MultiAgentEnvRunner.apply() (pid=741129, ip=192.168.1.29, actor_id=3f1365f5963cafdf655d67ff01000000, repr=<ray.rllib.env.multi_agent_env_runner.MultiAgentEnvRunner object at 0x72ebc01e0850>)
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 189, in apply
    raise e
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 178, in apply
    return func(self, *args, **kwargs)
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 89, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 137, in sample
    return self._sample_timesteps(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 227, in _sample_timesteps
    to_module = self._cached_to_module or self._env_to_module(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/connectors/env_to_module/env_to_module_pipeline.py", line 25, in __call__
    return super().__call__(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/connectors/connector_pipeline_v2.py", line 68, in __call__
    data = connector(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/connectors/common/add_observations_from_episodes_to_batch.py", line 111, in __call__
    for sa_episode in self.single_agent_episode_iterator(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/connectors/connector_v2.py", line 278, in single_agent_episode_iterator
    episode.get_agents_that_stepped()
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_episode.py", line 1423, in get_agents_that_stepped
    return set(self.get_observations(-1).keys())
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_episode.py", line 935, in get_observations
    return self._get(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_episode.py", line 1674, in _get
    return self._get_data_by_env_steps(**kwargs)
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/multi_agent_episode.py", line 1880, in _get_data_by_env_steps
    agent_indices = self.env_t_to_agent_t[agent_id].get(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/utils/infinite_lookback_buffer.py", line 158, in get
    data = self._get_int_index(
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/utils/infinite_lookback_buffer.py", line 468, in _get_int_index
    raise e
  File "/home/massimo/Documents/ray/.venv/lib/python3.10/site-packages/ray/rllib/env/utils/infinite_lookback_buffer.py", line 455, in _get_int_index
    data = data_to_use[idx]
IndexError: list index out of range

If i let the actors execute actions outside of their turns, and then i just ignore those that should not currently take a turn, it does fix this issue. Is there a proper way to address this?