Unable to train on PettingZoo Atari "double_dunk" environment

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

  • Ray version: 2.45.0
  • Python version: 3.9
  • OS: MacOS

3. What happened vs. what you expected:

  • Expected: IAfter tune.register_env("double_dunk", env_creator) and calling config.build_algo(), RLlib should recognize my custom PettingZoo env and launch training. Overall, I expect a normal PPO training loop on my DoubleDunk-v3 environment without crashes.
  • Actual: Immediately after starting, the training crashes returning the following errors:
EnvError: The env string you provided ('double_dunk_v3') is:
a) Not a supported or -installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.

Try one of the following:
a) For Atari support: `pip install gym[atari] autorom[accept-rom-license]`.
   For PyBullet support: `pip install pybullet`.
b) To register your custom env, do `from ray import tune;
   tune.register_env('[name]', lambda cfg: [return env obj from here using cfg])`.
   Then in your config, do `config.environment(env='[name]').
c) Make sure you provide a fully qualified classpath, e.g.:
   `ray.rllib.examples.envs.classes.repeat_after_me_env.RepeatAfterMeEnv`

I have tried all the steps mentioned above, but none of them actually helped me to solve the issue.

This is my code for reproducibility:

def env_creator(config):

    repeat_p   = config.get("repeat_action_probability", 0.25)
    frame_skip = config.get("frame_skip", 4)
    stack      = config.get("stack", 4)

    env = double_dunk_v3.parallel_env()
    env = ss.max_observation_v0(env, 2)
    env = ss.sticky_actions_v0(env, repeat_action_probability=repeat_p)
    env = ss.frame_skip_v0(env, frame_skip)
    env = ss.color_reduction_v0(env, mode="full")
    env = ss.resize_v1(env, 84, 84)
    env = ss.frame_stack_v1(env, stack)
    return PettingZooEnv(env)

env_name = "double_dunk_v3"
register_env(env_name, env_creator)

ray.shutdown()
ray.init()

env_temp = env_creator({})
obs_space = env_temp.observation_space
act_space = env_temp.action_space

config = (
    PPOConfig()
    .environment(env_name)
    .framework("torch")
    .env_runners(num_env_runners=0)
    .training(lr=2e-4, train_batch_size_per_learner=2000, num_epochs=10, gamma=0.9)
    .multi_agent(
        policies={"self_play": (None, obs_space, act_space, {})},
        policy_mapping_fn=lambda agent_id, **kw: "self_play"
    )
)

algorithm = config.build_algo()
for i in range(10):
    result = algorithm.train()