Custom Environment Training Works, But Evaluation Fails

Hello everyone,

This question is a continuation of this. You can check for full detail in the given link.

I’m currently working on a reinforcement learning project using RLlib and have successfully set up a custom environment for training. The training process runs as expected and the agent seems to learn from the environment. However, when I try to evaluate the trained agent using the same custom environment, the evaluation does not work as expected.

The agent can interact with the environment during training, but during evaluation, it seems to encounter issues.

Could you please provide some guidance on potential reasons for this problem and suggest steps to troubleshoot and resolve it? Additionally, are there any specific points I should pay attention to when setting up custom environments for evaluation?

Command for training rllib train file cartpole-ppo.yaml

File: cartpole-ppo.yaml

cartpole-ppo:
    env: custom_cartpole_env.CustomCartPole
    run: PPO
    stop:
        episode_reward_mean: 150
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: torch
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: true

File: custom_cartpole_env.py

import gymnasium as gym

class CustomCartPole(gym.Env):
    def __init__(self, env_config = None):
        self.env = gym.make('CartPole-v1')
        self.action_space = self.env.action_space
        self.observation_space = self.env.observation_space

    def reset(self, *, seed=None, options=None):
        return self.env.reset()

    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        return obs, reward, terminated, truncated, info

Thank you in advance for your help and support. Looking forward to your suggestions.

Hi @Hars,

Can you share the full error and stack trace you are getting.

1 Like

Hi @mannyv

After training, at the end,

2023-04-08 17:00:58,513 INFO tune.py:798 -- Total run time: 61.07 seconds (60.93 seconds for the tuning loop).

Your training finished.
Best available checkpoint for each trial:
  /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_b0e39
_00000_0_2023-04-08_16-59-57/checkpoint_000007

You can now evaluate your trained algorithm from any checkpoint, e.g. by running:
╭─────────────────────────────────────────────────────────────────────────────────────╮
│   rllib evaluate                                                                    │
│ /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_b0e │
│ 39_00000_0_2023-04-08_16-59-57/checkpoint_000007 --algo PPO                         │
╰─────────────────────────────────────────────────────────────────────────────────────╯

Then If I execute the command

$ rllib evaluate /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_b0e39_00000_0_2023-04-08_16-59-57/checkpoint_000007 --algo PPO

The trimmed output is as shown below

2023-04-08 17:02:49,336	INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2023-04-08 17:02:50,930	WARNING deprecation.py:50 -- DeprecationWarning: `algo = Algorithm(env='custom_cartpole_env.CustomCartPole', ...)` has been deprecated. Use `algo = AlgorithmConfig().environment('custom_cartpole_env.CustomCartPole').build()` instead. This will raise an error in the future!
2023-04-08 17:02:50,948	INFO algorithm.py:506 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=73211) *** SIGSEGV received at time=1680953574 on cpu 47 ***
(pid=73211) PC: @     0x7fb0bc3d95a0  (unknown)  _dl_allocate_tls_init
(pid=73211)     @     0x7fb0bc1ba630  1028800600  (unknown)
(pid=73211)     @ ... and at least 1 more frames
(pid=73211) [2023-04-08 17:02:54,167 E 73211 73229] logging.cc:361: *** SIGSEGV received at time=1680953574 on cpu 47 ***
(pid=73211) [2023-04-08 17:02:54,167 E 73211 73229] logging.cc:361: PC: @     0x7fb0bc3d95a0  (unknown)  _dl_allocate_tls_init
(pid=73211) [2023-04-08 17:02:54,167 E 73211 73229] logging.cc:361:     @     0x7fb0bc1ba630  1028800600  (unknown)
(pid=73211) [2023-04-08 17:02:54,167 E 73211 73229] logging.cc:361:     @ ... and at least 1 more frames
(pid=73211) Fatal Python error: Segmentation fault
(pid=73211) 
2023-04-08 17:02:57,276	WARNING worker.py:1866 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffabc1076057fbabe7e9d2b94301000000 Worker ID: 13ec98e591e562f946cb3abfa2066a0ec2fcfda8be8f617d22b03bd2 Node ID: ea4bb7d4fce2a98ada712073d305ecaf9739df225853965315761bd8 Worker IP address: xxx.xx.xx.xx Worker port: 38306 Worker PID: 73211 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(RolloutWorker pid=74331) 2023-04-08 17:03:02,872	WARNING env.py:156 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=74331) 2023-04-08 17:03:02,872	WARNING env.py:166 -- Your env reset() method appears to take 'seed' or 'return_info' arguments. Note that these are not yet supported in RLlib. Seeding will take place using 'env.seed()' and the info dict will not be returned from reset.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.p │
│ y:2219 in env_creator_from_classpath                                                             │
│                                                                                                  │
│   2216 │   │   │   │                                                                             │
│   2217 │   │   │   │   def env_creator_from_classpath(env_context):                              │
│   2218 │   │   │   │   │   try:                                                                  │
│ ❱ 2219 │   │   │   │   │   │   env_obj = from_config(env_specifier, env_context)                 │
│   2220 │   │   │   │   │   except ValueError:                                                    │
│   2221 │   │   │   │   │   │   raise EnvError(                                                   │
│   2222 │   │   │   │   │   │   │   ERR_MSG_INVALID_ENV_DESCRIPTOR.format(env_specifier)          │
│                                                                                                  │
│ ╭─────────────────────── locals ───────────────────────╮                                         │
│ │   env_context = {}                                   │                                         │
│ │ env_specifier = 'custom_cartpole_env.CustomCartPole' │                                         │
│ ╰──────────────────────────────────────────────────────╯                                         │
│                                                                                                  │
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/utils/from_config.py:1 │
│ 65 in from_config                                                                                │
│                                                                                                  │
│   162 │   │   │   │   if isinstance(cls, str):                                                   │
│   163 │   │   │   │   │   # Module found, but doesn't have the specified                         │
│   164 │   │   │   │   │   # c'tor/function.                                                      │
│ ❱ 165 │   │   │   │   │   raise ValueError(                                                      │
│   166 │   │   │   │   │   │   f"Full classpath specifier ({type_}) must be a valid "             │
│   167 │   │   │   │   │   │   "full [module].[class] string! E.g.: "                             │
│   168 │   │   │   │   │   │   "`my.cool.module.MyCoolClass`."                                    │
│                                                                                                  │
│ ╭─────────────────────── locals ───────────────────────╮                                         │
│ │           cls = 'custom_cartpole_env.CustomCartPole' │                                         │
│ │        config = {}                                   │                                         │
│ │   constructor = None                                 │                                         │
│ │     ctor_args = []                                   │                                         │
│ │   ctor_kwargs = {}                                   │                                         │
│ │ function_name = 'CustomCartPole'                     │                                         │
│ │        kwargs = {}                                   │                                         │
│ │   module_name = 'custom_cartpole_env'                │                                         │
│ │           obj = 'custom_cartpole_env.CustomCartPole' │                                         │
│ │         type_ = 'custom_cartpole_env.CustomCartPole' │                                         │
│ ╰──────────────────────────────────────────────────────╯                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Full classpath specifier (custom_cartpole_env.CustomCartPole) must be a valid full [module].[class] string! E.g.: `my.cool.module.MyCoolClass`.

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ................................................................................
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set. │
│ py:260 in _setup                                                                                 │
│                                                                                                  │
│   257 │   │                                                                                      │
│   258 │   │   # Create a local worker, if needed.                                                │
│   259 │   │   if local_worker:                                                                   │
│ ❱ 260 │   │   │   self._local_worker = self._make_worker(                                        │
│   261 │   │   │   │   cls=RolloutWorker,                                                         │
│   262 │   │   │   │   env_creator=self._env_creator,                                             │
│   263 │   │   │   │   validate_env=validate_env,                                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                config = <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x7f7182e48b20>    │ │
│ │ local_tf_session_args = {                                                                    │ │
│ │                         │   'intra_op_parallelism_threads': 8,                               │ │
│ │                         │   'inter_op_parallelism_threads': 8,                               │ │
│ │                         │   'gpu_options': {'allow_growth': True},                           │ │
│ │                         │   'log_device_placement': False,                                   │ │
│ │                         │   'device_count': {'CPU': 1},                                      │ │
│ │                         │   'allow_soft_placement': True                                     │ │
│ │                         }                                                                    │ │
│ │          local_worker = True                                                                 │ │
│ │           num_workers = 1                                                                    │ │
│ │                  self = <ray.rllib.evaluation.worker_set.WorkerSet object at 0x7f7182cd8e50> │ │
│ │                spaces = None                                                                 │ │
│ │          validate_env = <function Algorithm.validate_env at 0x7f7182f6d940>                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set. │
│ py:946 in _make_worker                                                                           │
│                                                                                                  │
│   943 │   │   │   Dict[PolicyID, Tuple[gym.spaces.Space, gym.spaces.Space]]                      │
│   944 │   │   ] = None,                                                                          │
│   945 │   ) -> Union[RolloutWorker, ActorHandle]:                                                │
│ ❱ 946 │   │   worker = cls(                                                                      │
│   947 │   │   │   env_creator=env_creator,                                                       │
│   948 │   │   │   validate_env=validate_env,                                                     │
│   949 │   │   │   default_policy_class=self._policy_class,                                       │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              cls = <class 'ray.rllib.evaluation.rollout_worker.RolloutWorker'>               │ │
│ │           config = <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x7f7182bfdd00>         │ │
│ │      env_creator = <function                                                                 │ │
│ │                    Algorithm._get_env_id_and_creator.<locals>.env_creator_from_classpath at  │ │
│ │                    0x7f7182ce53a0>                                                           │ │
│ │      num_workers = 1                                                                         │ │
│ │ recreated_worker = False                                                                     │ │
│ │             self = <ray.rllib.evaluation.worker_set.WorkerSet object at 0x7f7182cd8e50>      │ │
│ │           spaces = None                                                                      │ │
│ │     validate_env = <function Algorithm.validate_env at 0x7f7182f6d940>                       │ │
│ │     worker_index = 0                                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_wor │
│ ker.py:607 in __init__                                                                           │
│                                                                                                  │
│    604 │   │   │   and not self.config.create_env_on_local_worker                                │
│    605 │   │   ):                                                                                │
│    606 │   │   │   # Run the `env_creator` function passing the EnvContext.                      │
│ ❱  607 │   │   │   self.env = env_creator(copy.deepcopy(self.env_context))                       │
│    608 │   │                                                                                     │
│    609 │   │   clip_rewards = self.config.clip_rewards                                           │
│    610                                                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                    AlgorithmConfig = <class                                                  │ │
│ │                                      'ray.rllib.algorithms.algorithm_config.AlgorithmConfig… │ │
│ │                         batch_mode = -1                                                      │ │
│ │                          callbacks = -1                                                      │ │
│ │                       clip_actions = -1                                                      │ │
│ │                       clip_rewards = -1                                                      │ │
│ │              compress_observations = -1                                                      │ │
│ │                             config = <ray.rllib.algorithms.ppo.ppo.PPOConfig object at       │ │
│ │                                      0x7f7182bfdd00>                                         │ │
│ │ configured_rollout_fragment_length = 4000                                                    │ │
│ │                     count_steps_by = -1                                                      │ │
│ │                     dataset_shards = None                                                    │ │
│ │               default_policy_class = <class                                                  │ │
│ │                                      'ray.rllib.algorithms.ppo.ppo_torch_policy.PPOTorchPol… │ │
│ │               disable_env_checking = -1                                                      │ │
│ │                         env_config = -1                                                      │ │
│ │                        env_context = {}                                                      │ │
│ │                        env_creator = <function                                               │ │
│ │                                      Algorithm._get_env_id_and_creator.<locals>.env_creator… │ │
│ │                                      at 0x7f7182ce53a0>                                      │ │
│ │                    episode_horizon = -1                                                      │ │
│ │              extra_python_environs = -1                                                      │ │
│ │                       fake_sampler = -1                                                      │ │
│ │                       gen_rollouts = <function RolloutWorker.__init__.<locals>.gen_rollouts  │ │
│ │                                      at 0x7f7182c57e50>                                      │ │
│ │                      input_creator = -1                                                      │ │
│ │                            log_dir = '/home/user/ray_results/PPO_custom_cartpole_env.C… │ │
│ │                          log_level = -1                                                      │ │
│ │                       model_config = -1                                                      │ │
│ │                     no_done_at_end = -1                                                      │ │
│ │                  normalize_actions = -1                                                      │ │
│ │                           num_envs = -1                                                      │ │
│ │                        num_workers = 1                                                       │ │
│ │                     observation_fn = -1                                                      │ │
│ │                     output_creator = -1                                                      │ │
│ │                  policies_to_train = -1                                                      │ │
│ │                             policy = -1                                                      │ │
│ │                      policy_config = -1                                                      │ │
│ │                  policy_mapping_fn = -1                                                      │ │
│ │                        policy_spec = -1                                                      │ │
│ │                  preprocessor_pref = -1                                                      │ │
│ │                   recreated_worker = False                                                   │ │
│ │           remote_env_batch_wait_ms = -1                                                      │ │
│ │                 remote_worker_envs = -1                                                      │ │
│ │            rollout_fragment_length = -1                                                      │ │
│ │                       sample_async = -1                                                      │ │
│ │                               seed = -1                                                      │ │
│ │                               self = <ray.rllib.evaluation.rollout_worker.RolloutWorker      │ │
│ │                                      object at 0x7f7182bfdd30>                               │ │
│ │                       soft_horizon = -1                                                      │ │
│ │                             spaces = None                                                    │ │
│ │                 tf_session_creator = -1                                                      │ │
│ │                       validate_env = <function Algorithm.validate_env at 0x7f7182f6d940>     │ │
│ │                       worker_index = 0                                                       │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/user/anaconda3/envs/game/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.p │
│ y:2221 in env_creator_from_classpath                                                             │
│                                                                                                  │
│   2218 │   │   │   │   │   try:                                                                  │
│   2219 │   │   │   │   │   │   env_obj = from_config(env_specifier, env_context)                 │
│   2220 │   │   │   │   │   except ValueError:                                                    │
│ ❱ 2221 │   │   │   │   │   │   raise EnvError(                                                   │
│   2222 │   │   │   │   │   │   │   ERR_MSG_INVALID_ENV_DESCRIPTOR.format(env_specifier)          │
│   2223 │   │   │   │   │   │   )                                                                 │
│   2224 │   │   │   │   │   return env_obj                                                        │
│                                                                                                  │
│ ╭─────────────────────── locals ───────────────────────╮                                         │
│ │   env_context = {}                                   │                                         │
│ │ env_specifier = 'custom_cartpole_env.CustomCartPole' │                                         │
│ ╰──────────────────────────────────────────────────────╯                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
EnvError: The env string you provided ('custom_cartpole_env.CustomCartPole') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.

Try one of the following:
a) For Atari support: `pip install gym[atari] autorom[accept-rom-license]`.
   For VizDoom support: Install VizDoom
   (https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md) and
   `pip install vizdoomgym`.
   For PyBullet support: `pip install pybullet`.
b) To register your custom env, do `from ray import tune;
   tune.register('[name]', lambda cfg: [return env obj from here using cfg])`.
   Then in your config, do `config['env'] = [name]`.
c) Make sure you provide a fully qualified classpath, e.g.:
   `ray.rllib.examples.env.repeat_after_me_env.RepeatAfterMeEnv`

Hi @mannyv ,

To reproduce the error, please follow these steps:

  1. Create a new folder and place two files in it: cartpole-ppo.yaml and custom_cartpole_env.py. Ensure that both files contain the content shared previously.
  2. Open a terminal window, navigate to the folder containing the two files, and execute the following command:
rllib train file cartpole-ppo.yaml
  1. Once the training is complete, you can evaluate the results based on the output provided by the above command.

Please let me know if you need any further clarification or encounter any issues while following these steps.

@Hars, could you show how your command looks like? It should be something like:

rllib evaluate /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_b0e39_00000_0_2023-04-08_16-59-57/checkpoint_000007 \
   --algo PPO \
   --env custom_cartpole_env.CustomCartPole

@Lars_Simon_Zehnder After training, I am getting the following results:

2023-04-15 06:14:47,866	INFO tune.py:798 -- Total run time: 46.47 seconds (46.23 seconds for the tuning loop).

Your training finished.
Best available checkpoint for each trial:
  /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_9d616_00000_0_2023-04-15_06-14-01/checkpoint_000007

You can now evaluate your trained algorithm from any checkpoint, e.g. by running:
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│   rllib evaluate /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_9d616_00000_0_2023-04-15_06-14-01/checkpoint_000007 --algo PPO   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

After that, I executed the following command:


rllib evaluate /home/user/ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_9d616_00000_0_2023-04-15_06-14-01/checkpoint_000007 --algo PPO --env custom_cartpole_env.CustomCartPole

However, the error remains the same.

Hello everyone,

I’m following up on my previous question regarding the evaluation of a custom environment in RLlib. I haven’t been able to resolve the issue yet, and I would appreciate any further guidance or suggestions.

Just to recap, I’ve successfully set up a custom environment for training using RLlib. The training process runs as expected and the agent seems to learn from the environment. However, when I try to evaluate the trained agent using the same custom environment, the evaluation does not work as expected.

I’ve shared the relevant code snippets and command lines in my previous messages. Despite following the suggested steps, I’m still experiencing the same error. If anyone has any insight into the issue or can provide alternative solutions, I would greatly appreciate it.

Thank you in advance for your help!

Hi did you ever solve this? I’m following along in a book, with a simple environment: maze_gym_env.GymEnvironment. When I run

rllib evaluate /home/nawal/ray_results/default_5e69e990f7db4f21b279738bee4d9b60/DQN_maze_gym_env.GymEnvironment_7d6d7_00000_0_2024-02-21_09-49-22/checkpoint_000000 --algo DQN --env maze_gym_env.Environment

I get the error:
EnvError: The env string you provided (‘maze_gym_env.GymEnvironment’) is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.

Thanks,

Nawal