Registering Custom Environment for `CartPole-v1` with RLlib and Running via Command Line

Hello everyone,

I am trying to train a PPO agent with a custom environment, CartPole1-v1. I have created the custom environment, but I am having trouble registering it with Ray RLlib. I have tried multiple approaches, but I keep encountering errors. I would like to train the agent using the rllib train file command with a config file. Here is the configuration I want to use:

cartpole-ppo:
    env: CartPole1-v1
    run: PPO
    stop:
        episode_reward_mean: 150
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: torch
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: true

Could anyone provide guidance on how to properly register the CartPole1-v1 custom environment with Ray RLlib and run the training using the rllib train file command with the above config file? Any help would be greatly appreciated!

Thank you!

I am trying to register and train a custom environment using the rllib train file command and a configuration file. My custom environment, CustomCartPole, wraps the โ€˜CartPole-v1โ€™ environment from Gym. I have registered the environment with the string name โ€œCartPole1-v1โ€ as shown in the code below:

import os
import gymnasium as gym
import ray
from ray import tune
from ray.tune.registry import register_env

class CustomCartPole(gym.Env):
    def __init__(self, env_config):
        self.env = gym.make('CartPole-v1')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self):
        return self.env.reset()

    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        return obs, reward, done, info

def custom_cartpole_creator(env_config):
    return CustomCartPole(env_config)


register_env("CartPole1-v1", custom_cartpole_creator)

Is there anything wrong with the way I have registered the custom environment? Will the rllib train file command work with this custom environment when I specify โ€œCartPole1-v1โ€ in my configuration file? If not, what changes do I need to make to ensure compatibility with the rllib train file command?

Hi @Hars , it is hard to say what exactly your error is. Using register_env is here the right way to do it.

For the records: You do not need to import the register_env from tune.registry directly as it is available from the __init__.py via

from ray.tune import register_env

The way you registered your environment with Tune appears to be correct. This code runs on my side (Ray 2.2.0, Python 3.9.4):

import gym
from ray import air, tune
from ray.rllib.algorithms.ppo.ppo import PPOConfig
class CustomCartPole(gym.Env):
    def __init__(self, env_config):
        self.env = gym.make('CartPole-v1')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self):
        return self.env.reset()

    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        return obs, reward, done, info

def custom_cartpole_creator(env_config):
    return CustomCartPole(env_config)


register_env("CartPole1-v1", custom_cartpole_creator)

config = (
    PPOConfig()
    .environment(env="CartPole1-v1")
)
stop = {
    "training_iteration": 2
}
tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=air.RunConfig(
        stop=stop,
        verbose=1,
        local_dir="~/ray_results/TestDIscuss",
        checkpoint_config=air.CheckpointConfig(
            checkpoint_frequency=10,
            checkpoint_at_end=True,
        )
    )
)
tuner.fit()

Hi @Lars_Simon_Zehnder , initially your code didnโ€™t work on my system possibly due to installation of ray-rllib with pip instead of conda. After the installation with conda, your program is working properly on my system, but still the actual question remain same as I want to use an external config file.

If I try to run using os.system(f'rllib train file cartpole1-ppo.yaml'), it is not working. I am wondering whether there it is possible to do this.

@Hars , I am also using pip when installing it. How do you install this? Did you follow the documentation?

python -m pip install "ray[tune,rllib]"

should do the work.

If you want to run RLlib algorithm via a YAML file you need to either

  • Define the environment in a file that can be imported in the working directory of your program call (see here)
  • or register the environment via tune.register_env() and then run the python script from the command line via python myscript.py (see here)

Otherwise RLlib does not know where to find your environment class definition.

1 Like

Dear @Lars_Simon_Zehnder,

I initially installed the package by following the documentation. However, it didnโ€™t seem to work, possibly due to the version of Python I was using at the time. Now, I can successfully train my model. The first option you provided worked for me.

After carefully reviewing the configurations, I updated the cartpole1-ppo.yaml file as shown below:

cartpole-ppo:
    env: custom_cartpole_env.CustomCartPole
    run: PPO
    stop:
        episode_reward_mean: 150
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: torch
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: true

The custom_cartpole_env.py file contains the following lines:

import gymnasium as gym

class CustomCartPole(gym.Env):
    def __init__(self, env_config = None):
        self.env = gym.make('CartPole-v1')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self, *, seed=None, options=None):
        return self.env.reset()

    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        return obs, reward, terminated, truncated, info

Training works well using the command rllib train file cartpole1-ppo.yaml , but I encounter issues during evaluation. The error message is as follows:

EnvError: The env string you provided ('custom_cartpole_env.CustomCartPole') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.
...

Upon completing the training, I receive the expected message, but the above error occurs during evaluation:

2023-04-06 12:16:39,770 INFO tune.py:798 -- Total run time: 62.62 seconds (62.36 seconds for the tuning loop).

Your training finished.
Best available checkpoint for each trial:
  /home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007

You can now evaluate your trained algorithm from any checkpoint, e.g. by running:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚   rllib evaluate /home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007 --algo PPO       โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Hi @Hars, good to hear that training runs. For evaluation could you try to provide also the environment via the --env option (similar to the --algo option? You can provide here the path to the class.

Hi @Lars_Simon_Zehnder, even when I pass the --env option, Iโ€™m still encountering the same error. Can you please provide some guidance on how to resolve this issue? Additionally, Iโ€™ve opened a separate topic to discuss this issue in more detail. Any assistance would be greatly appreciated.

FYI: Changed for evaluation topic to Custom Environment Training Works, But Evaluation Fails