Registering Custom Environment for `CartPole-v1` with RLlib and Running via Command Line

Dear @Lars_Simon_Zehnder,

I initially installed the package by following the documentation. However, it didn’t seem to work, possibly due to the version of Python I was using at the time. Now, I can successfully train my model. The first option you provided worked for me.

After carefully reviewing the configurations, I updated the cartpole1-ppo.yaml file as shown below:

cartpole-ppo:
    env: custom_cartpole_env.CustomCartPole
    run: PPO
    stop:
        episode_reward_mean: 150
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: torch
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: true

The custom_cartpole_env.py file contains the following lines:

import gymnasium as gym

class CustomCartPole(gym.Env):
    def __init__(self, env_config = None):
        self.env = gym.make('CartPole-v1')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self, *, seed=None, options=None):
        return self.env.reset()

    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        return obs, reward, terminated, truncated, info

Training works well using the command rllib train file cartpole1-ppo.yaml , but I encounter issues during evaluation. The error message is as follows:

EnvError: The env string you provided ('custom_cartpole_env.CustomCartPole') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.
...

Upon completing the training, I receive the expected message, but the above error occurs during evaluation:

2023-04-06 12:16:39,770 INFO tune.py:798 -- Total run time: 62.62 seconds (62.36 seconds for the tuning loop).

Your training finished.
Best available checkpoint for each trial:
  /home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007

You can now evaluate your trained algorithm from any checkpoint, e.g. by running:
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│   rllib evaluate /home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007 --algo PPO       │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯