1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.45.0
- Python version: 3.9
- OS: MacOS
3. What happened vs. what you expected:
- Expected: IAfter
tune.register_env("double_dunk", env_creator)
and callingconfig.build_algo()
, RLlib should recognize my custom PettingZoo env and launch training. Overall, I expect a normal PPO training loop on my DoubleDunk-v3 environment without crashes. - Actual: Immediately after starting, the training crashes returning the following errors:
EnvError: The env string you provided ('double_dunk_v3') is:
a) Not a supported or -installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.
Try one of the following:
a) For Atari support: `pip install gym[atari] autorom[accept-rom-license]`.
For PyBullet support: `pip install pybullet`.
b) To register your custom env, do `from ray import tune;
tune.register_env('[name]', lambda cfg: [return env obj from here using cfg])`.
Then in your config, do `config.environment(env='[name]').
c) Make sure you provide a fully qualified classpath, e.g.:
`ray.rllib.examples.envs.classes.repeat_after_me_env.RepeatAfterMeEnv`
I have tried all the steps mentioned above, but none of them actually helped me to solve the issue.
This is my code for reproducibility:
def env_creator(config):
repeat_p = config.get("repeat_action_probability", 0.25)
frame_skip = config.get("frame_skip", 4)
stack = config.get("stack", 4)
env = double_dunk_v3.parallel_env()
env = ss.max_observation_v0(env, 2)
env = ss.sticky_actions_v0(env, repeat_action_probability=repeat_p)
env = ss.frame_skip_v0(env, frame_skip)
env = ss.color_reduction_v0(env, mode="full")
env = ss.resize_v1(env, 84, 84)
env = ss.frame_stack_v1(env, stack)
return PettingZooEnv(env)
env_name = "double_dunk_v3"
register_env(env_name, env_creator)
ray.shutdown()
ray.init()
env_temp = env_creator({})
obs_space = env_temp.observation_space
act_space = env_temp.action_space
config = (
PPOConfig()
.environment(env_name)
.framework("torch")
.env_runners(num_env_runners=0)
.training(lr=2e-4, train_batch_size_per_learner=2000, num_epochs=10, gamma=0.9)
.multi_agent(
policies={"self_play": (None, obs_space, act_space, {})},
policy_mapping_fn=lambda agent_id, **kw: "self_play"
)
)
algorithm = config.build_algo()
for i in range(10):
result = algorithm.train()