I have some difficulties in moving to the newest version of rllib
. I used to have this:
rllib_config = {
"env_config": exp_run_config["env"],
"framework": train_config["framework"],
"multiagent": multiagent_config,
"num_workers": train_config["num_workers"],
"num_gpus": train_config["num_gpus"],
"num_envs_per_worker": train_config["num_envs"] // train_config["num_workers"],
"train_batch_size": train_config["train_batch_size"],
}
rllib_trainer = A2CTrainer(
env=EnvWrapper,
config=rllib_config,
)
Now I changed it to something that looks like this:
rllib_trainer = (PPOConfig()
.environment(
env=EnvWrapper,
env_config=config_rllib["env_config"]
)
.framework(config_rllib["framework"])
.resources(
num_gpus=config_rllib["num_gpus"],
num_cpus_for_main_process=config_rllib["num_workers"],
placement_strategy=config_rllib["placement_strategy"],
)
.training(
train_batch_size=config_rllib["train_batch_size"]
)
#.env_runners(num_env_runners=1)
.multi_agent(**{
"policies": config_rllib["multiagent"]["policies"],
"policy_mapping_fn": config_rllib["multiagent"]["policy_mapping_fn"],
"policies_to_train": config_rllib["multiagent"]["policies_to_train"]
})
.build())
and my EnvWrapper
looks like this:
class EnvWrapper(MultiAgentEnv):
def __init__(self, env_config=None):
super().__init__()
env_config_copy = env_config.copy()
assert isinstance(env_config_copy, dict)
self.env = import_class_from_path("Test", os.path.join(source_dir, "test"))(
**env_config_copy
)
self.action_space = self.env.action_space
self.observation_space = recursive_obs_dict_to_spaces_dict(self.env.reset())
def reset(self):
obs = self.env.reset()
# Convert lists to numpy arrays in the observation dict
obs_processed = recursive_list_to_np_array(obs)
# Return observation and empty info dict per agent
return obs_processed, {agent_id: {} for agent_id in obs_processed}
def step(self, actions=None):
assert actions is not None
assert isinstance(actions, dict)
obs, rewards, done, info = self.env.step(actions)
# Process observations
obs_processed = recursive_list_to_np_array(obs)
# Create truncated dict (same structure as dones)
truncated = {agent_id: False for agent_id in done}
if "__all__" in done:
truncated["__all__"] = False
return obs_processed, rewards, done, truncated, info
This currently throws a whole lot of errors that look like this:
AttributeError: 'dict' object has no attribute 'spaces'
2024-12-13 17:24:13,391 ERROR multi_agent_env_runner.py:858 -- Your environment (<EnvWrapper<rllib-multi-agent-env-v0>>) does not abide to the new gymnasium-style API!
From Ray 2.3 on, RLlib only supports the new (gym>=0.26 or gymnasium) Env APIs.
In particular, the `reset()` method seems to be faulty.
Learn more about the most important changes here:
https://github.com/openai/gym and here: https://github.com/Farama-Foundation/Gymnasium
In order to fix this problem, do the following:
1) Run `pip install gymnasium` on your command line.
2) Change all your import statements in your code from
`import gym` -> `import gymnasium as gym` OR
`from gym.spaces import Discrete` -> `from gymnasium.spaces import Discrete`
For your custom (single agent) gym.Env classes:
3.1) Either wrap your old Env class via the provided `from gymnasium.wrappers import
EnvCompatibility` wrapper class.
3.2) Alternatively to 3.1:
- Change your `reset()` method to have the call signature 'def reset(self, *,
seed=None, options=None)'
- Return an additional info dict (empty dict should be fine) from your `reset()`
method.
- Return an additional `truncated` flag from your `step()` method (between `done` and
`info`). This flag should indicate, whether the episode was terminated prematurely
due to some time constraint or other kind of horizon setting.
For your custom RLlib `MultiAgentEnv` classes:
4.1) Either wrap your old MultiAgentEnv via the provided
`from ray.rllib.env.wrappers.multi_agent_env_compatibility import
MultiAgentEnvCompatibility` wrapper class.
4.2) Alternatively to 4.1:
- Change your `reset()` method to have the call signature
'def reset(self, *, seed=None, options=None)'
- Return an additional per-agent info dict (empty dict should be fine) from your
`reset()` method.
- Rename `dones` into `terminateds` and only set this to True, if the episode is really
done (as opposed to has been terminated prematurely due to some horizon/time-limit
setting).
- Return an additional `truncateds` per-agent dictionary flag from your `step()`
method, including the `__all__` key (100% analogous to your `dones/terminateds`
per-agent dict).
Return this new `truncateds` dict between `dones/terminateds` and `infos`. This
flag should indicate, whether the episode (for some agent or all agents) was
terminated prematurely due to some time constraint or other kind of horizon setting.
Traceback (most recent call last):
File "/ray/rllib/utils/pre_checks/env.py", line 46, in check_multiagent_environments
obs_and_infos = env.reset(seed=42, options={})
TypeError: EnvWrapper.reset() got an unexpected keyword argument 'seed'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/ray/rllib/env/multi_agent_env_runner.py", line 856, in make_env
check_multiagent_environments(self.env.unwrapped)
File "/ray/rllib/utils/pre_checks/env.py", line 48, in check_multiagent_environments
raise ValueError(
ValueError: Your environment (<EnvWrapper<rllib-multi-agent-env-v0>>) does not abide to the new gymnasium-style API!
From Ray 2.3 on, RLlib only supports the new (gym>=0.26 or gymnasium) Env APIs.
In particular, the `reset()` method seems to be faulty.
Learn more about the most important changes here:
https://github.com/openai/gym and here: https://github.com/Farama-Foundation/Gymnasium
In order to fix this problem, do the following:
1) Run `pip install gymnasium` on your command line.
2) Change all your import statements in your code from
`import gym` -> `import gymnasium as gym` OR
`from gym.spaces import Discrete` -> `from gymnasium.spaces import Discrete`
For your custom (single agent) gym.Env classes:
3.1) Either wrap your old Env class via the provided `from gymnasium.wrappers import
EnvCompatibility` wrapper class.
3.2) Alternatively to 3.1:
- Change your `reset()` method to have the call signature 'def reset(self, *,
seed=None, options=None)'
- Return an additional info dict (empty dict should be fine) from your `reset()`
method.
- Return an additional `truncated` flag from your `step()` method (between `done` and
`info`). This flag should indicate, whether the episode was terminated prematurely
due to some time constraint or other kind of horizon setting.
For your custom RLlib `MultiAgentEnv` classes:
4.1) Either wrap your old MultiAgentEnv via the provided
`from ray.rllib.env.wrappers.multi_agent_env_compatibility import
MultiAgentEnvCompatibility` wrapper class.
4.2) Alternatively to 4.1:
- Change your `reset()` method to have the call signature
'def reset(self, *, seed=None, options=None)'
- Return an additional per-agent info dict (empty dict should be fine) from your
`reset()` method.
- Rename `dones` into `terminateds` and only set this to True, if the episode is really
done (as opposed to has been terminated prematurely due to some horizon/time-limit
setting).
- Return an additional `truncateds` per-agent dictionary flag from your `step()`
method, including the `__all__` key (100% analogous to your `dones/terminateds`
per-agent dict).
Return this new `truncateds` dict between `dones/terminateds` and `infos`. This
flag should indicate, whether the episode (for some agent or all agents) was
terminated prematurely due to some time constraint or other kind of horizon setting.
I’m not entirely sure where to start fixing this!