How to update to the newest API version

I have some difficulties in moving to the newest version of rllib. I used to have this:

rllib_config = {
    "env_config": exp_run_config["env"],
    "framework": train_config["framework"],
    "multiagent": multiagent_config,
    "num_workers": train_config["num_workers"],
    "num_gpus": train_config["num_gpus"],
    "num_envs_per_worker": train_config["num_envs"] // train_config["num_workers"],
    "train_batch_size": train_config["train_batch_size"],
}

rllib_trainer = A2CTrainer(
    env=EnvWrapper,
    config=rllib_config,
)

Now I changed it to something that looks like this:

rllib_trainer = (PPOConfig()
  .environment(
      env=EnvWrapper,
      env_config=config_rllib["env_config"]
  )
  .framework(config_rllib["framework"])
  .resources(
      num_gpus=config_rllib["num_gpus"],
      num_cpus_for_main_process=config_rllib["num_workers"],
      placement_strategy=config_rllib["placement_strategy"],
  )
  .training(
      train_batch_size=config_rllib["train_batch_size"]
  )
  #.env_runners(num_env_runners=1)
  .multi_agent(**{
      "policies": config_rllib["multiagent"]["policies"],
      "policy_mapping_fn": config_rllib["multiagent"]["policy_mapping_fn"],
      "policies_to_train": config_rllib["multiagent"]["policies_to_train"]
  })
  .build())

and my EnvWrapper looks like this:

class EnvWrapper(MultiAgentEnv):  
    def __init__(self, env_config=None):
        super().__init__()
        env_config_copy = env_config.copy()        
        assert isinstance(env_config_copy, dict)
        self.env = import_class_from_path("Test", os.path.join(source_dir, "test"))(
            **env_config_copy
        )

        self.action_space = self.env.action_space

        self.observation_space = recursive_obs_dict_to_spaces_dict(self.env.reset())

    def reset(self):
        obs = self.env.reset()
        # Convert lists to numpy arrays in the observation dict
        obs_processed = recursive_list_to_np_array(obs)
        # Return observation and empty info dict per agent
        return obs_processed, {agent_id: {} for agent_id in obs_processed}

    def step(self, actions=None):
        assert actions is not None
        assert isinstance(actions, dict)
        obs, rewards, done, info = self.env.step(actions)
        
        # Process observations
        obs_processed = recursive_list_to_np_array(obs)
        
        # Create truncated dict (same structure as dones)
        truncated = {agent_id: False for agent_id in done}
        if "__all__" in done:
            truncated["__all__"] = False
        
        return obs_processed, rewards, done, truncated, info

This currently throws a whole lot of errors that look like this:

AttributeError: 'dict' object has no attribute 'spaces'
2024-12-13 17:24:13,391 ERROR multi_agent_env_runner.py:858 -- Your environment (<EnvWrapper<rllib-multi-agent-env-v0>>) does not abide to the new gymnasium-style API!
From Ray 2.3 on, RLlib only supports the new (gym>=0.26 or gymnasium) Env APIs.
In particular, the `reset()` method seems to be faulty.
Learn more about the most important changes here:
https://github.com/openai/gym and here: https://github.com/Farama-Foundation/Gymnasium

In order to fix this problem, do the following:

1) Run `pip install gymnasium` on your command line.
2) Change all your import statements in your code from
   `import gym` -> `import gymnasium as gym` OR
   `from gym.spaces import Discrete` -> `from gymnasium.spaces import Discrete`

For your custom (single agent) gym.Env classes:
3.1) Either wrap your old Env class via the provided `from gymnasium.wrappers import
     EnvCompatibility` wrapper class.
3.2) Alternatively to 3.1:
 - Change your `reset()` method to have the call signature 'def reset(self, *,
   seed=None, options=None)'
 - Return an additional info dict (empty dict should be fine) from your `reset()`
   method.
 - Return an additional `truncated` flag from your `step()` method (between `done` and
   `info`). This flag should indicate, whether the episode was terminated prematurely
   due to some time constraint or other kind of horizon setting.

For your custom RLlib `MultiAgentEnv` classes:
4.1) Either wrap your old MultiAgentEnv via the provided
     `from ray.rllib.env.wrappers.multi_agent_env_compatibility import
     MultiAgentEnvCompatibility` wrapper class.
4.2) Alternatively to 4.1:
 - Change your `reset()` method to have the call signature
   'def reset(self, *, seed=None, options=None)'
 - Return an additional per-agent info dict (empty dict should be fine) from your
   `reset()` method.
 - Rename `dones` into `terminateds` and only set this to True, if the episode is really
   done (as opposed to has been terminated prematurely due to some horizon/time-limit
   setting).
 - Return an additional `truncateds` per-agent dictionary flag from your `step()`
   method, including the `__all__` key (100% analogous to your `dones/terminateds`
   per-agent dict).
   Return this new `truncateds` dict between `dones/terminateds` and `infos`. This
   flag should indicate, whether the episode (for some agent or all agents) was
   terminated prematurely due to some time constraint or other kind of horizon setting.
Traceback (most recent call last):
  File "/ray/rllib/utils/pre_checks/env.py", line 46, in check_multiagent_environments
    obs_and_infos = env.reset(seed=42, options={})
TypeError: EnvWrapper.reset() got an unexpected keyword argument 'seed'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/ray/rllib/env/multi_agent_env_runner.py", line 856, in make_env
    check_multiagent_environments(self.env.unwrapped)
  File "/ray/rllib/utils/pre_checks/env.py", line 48, in check_multiagent_environments
    raise ValueError(
ValueError: Your environment (<EnvWrapper<rllib-multi-agent-env-v0>>) does not abide to the new gymnasium-style API!
From Ray 2.3 on, RLlib only supports the new (gym>=0.26 or gymnasium) Env APIs.
In particular, the `reset()` method seems to be faulty.
Learn more about the most important changes here:
https://github.com/openai/gym and here: https://github.com/Farama-Foundation/Gymnasium

In order to fix this problem, do the following:

1) Run `pip install gymnasium` on your command line.
2) Change all your import statements in your code from
   `import gym` -> `import gymnasium as gym` OR
   `from gym.spaces import Discrete` -> `from gymnasium.spaces import Discrete`

For your custom (single agent) gym.Env classes:
3.1) Either wrap your old Env class via the provided `from gymnasium.wrappers import
     EnvCompatibility` wrapper class.
3.2) Alternatively to 3.1:
 - Change your `reset()` method to have the call signature 'def reset(self, *,
   seed=None, options=None)'
 - Return an additional info dict (empty dict should be fine) from your `reset()`
   method.
 - Return an additional `truncated` flag from your `step()` method (between `done` and
   `info`). This flag should indicate, whether the episode was terminated prematurely
   due to some time constraint or other kind of horizon setting.

For your custom RLlib `MultiAgentEnv` classes:
4.1) Either wrap your old MultiAgentEnv via the provided
     `from ray.rllib.env.wrappers.multi_agent_env_compatibility import
     MultiAgentEnvCompatibility` wrapper class.
4.2) Alternatively to 4.1:
 - Change your `reset()` method to have the call signature
   'def reset(self, *, seed=None, options=None)'
 - Return an additional per-agent info dict (empty dict should be fine) from your
   `reset()` method.
 - Rename `dones` into `terminateds` and only set this to True, if the episode is really
   done (as opposed to has been terminated prematurely due to some horizon/time-limit
   setting).
 - Return an additional `truncateds` per-agent dictionary flag from your `step()`
   method, including the `__all__` key (100% analogous to your `dones/terminateds`
   per-agent dict).
   Return this new `truncateds` dict between `dones/terminateds` and `infos`. This
   flag should indicate, whether the episode (for some agent or all agents) was
   terminated prematurely due to some time constraint or other kind of horizon setting.

I’m not entirely sure where to start fixing this!

Hi @greenleaf

Welcome to the forum.

This is the specific error you are getting.

You should be able to fix this one by following the first recommendation.

I thought it would have been something far more complex given that long message. Thank you.