Register a custom environment and runing PPOTrainer on that environment not working

It blocks me to complete my task.

I’m trying to run the PPO algorithm on my custom gym environment (I’m new to new to RL). first I wrote a gyn env for my robotic dog, you can see it here:

import gym
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
import random
from gym_dog.envs.mujoco import mujoco_env
from ray.rllib.agents.ppo import PPOTrainer



class DogEnv2(mujoco_env.MujocoEnv, utils.EzPickle, gym.Env):

    def __init__(self):
        mujoco_env.MujocoEnv.__init__(self, "./go1/xml/go1.xml", 5)
        utils.EzPickle.__init__(self)

    def step(self, action):
        xposbefore = self.sim.data.qpos[0]
        self.do_simulation(action, self.frame_skip)
        xposafter = self.sim.data.qpos[0]
        ob = self._get_obs()
        reward_ctrl = -0.1 * np.square(action).sum()
        reward_run = (xposafter - xposbefore) / self.dt
        reward = reward_ctrl + reward_run
        done = False
        return ob, reward, done, dict(reward_run=reward_run, reward_ctrl=reward_ctrl)

    def _get_obs(self):
        return np.concatenate(
            [
                self.sim.data.qpos.flat[1:],
                self.sim.data.qvel.flat,
            ]
        )

    def reset_model(self):
        return self._get_obs()

    def viewer_setup(self):
        self.viewer.cam.distance = self.model.stat.extent * 0.5

then, I wrote my main file like this:

import numpy as np
import gym
import mujoco_py
import random
import sys

sys.path.insert(1, '/home/mosaic-challenge/shirelle_ws/gym_shirelle/gym-dog')
import gym_dog
from gym_dog.envs.dog_env_2 import DogEnv2
import ray
from ray.rllib.agents import ppo
import tensorflow as tf
from ray.tune.registry import register_env
from ray import tune
from ray.tune.logger import pretty_print
import time


def create_my_env():
    import gym
    # from gym_dog.envs.dog_env_2 import DogEnv2
    env = gym.make('dog-v2')
    return env

ray.init()
env_creator = lambda config: create_my_env()
register_env('DogEnv2', env_creator)
ppo_config = ppo.DEFAULT_CONFIG.copy()
trainer = ppo.PPOTrainer(config=ppo_config, env="DogEnv2")
for _ in range(10):
    result = trainer.train()
    print(pretty_print(result))

ray.shutdown()

I’m getting- a raise error.UnregisteredEnv(‘No registered env with id: {}’.format(id))
what is wrong with my registration?

I also tried the following:

ray.init()
env_creator = lambda config: create_my_env()
register_env('DogEnv2', env_creator)

tune.run(
        "PPO",
        stop={"episode_reward_mean": 200},
        config={
            "env": env_creator,
            "num_workers": 1,
            },
        )

and then I’m getting this error:

2022-05-15 09:54:40,617	INFO trial_runner.py:803 -- starting PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000
== Status ==
Current time: 2022-05-15 09:54:42 (running for 00:00:01.94)
Memory usage on this node: 8.1/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/16 CPUs, 0/1 GPUs, 0.0/33.81 GiB heap, 0.0/16.9 GiB objects
Result logdir: /home/mosaic-challenge/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
+-------------------------------------------------------+----------+-------+
| Trial name                                            | status   | loc   |
|-------------------------------------------------------+----------+-------|
| PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000 | RUNNING  |       |
+-------------------------------------------------------+----------+-------+


2022-05-15 09:54:42,410	ERROR trial_runner.py:876 -- Trial PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000: Error processing event.
NoneType: None
Result for PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000:
  trial_id: e473d_00000
  
== Status ==
Current time: 2022-05-15 09:54:42 (running for 00:00:01.95)
Memory usage on this node: 8.1/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/1 GPUs, 0.0/33.81 GiB heap, 0.0/16.9 GiB objects
Result logdir: /home/mosaic-challenge/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
+-------------------------------------------------------+----------+-------+
| Trial name                                            | status   | loc   |
|-------------------------------------------------------+----------+-------|
| PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000 | ERROR    |       |
+-------------------------------------------------------+----------+-------+
Number of errored trials: 1
+-------------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
| Trial name                                            |   # failures | error file                                                                                                                   |
|-------------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------|
| PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000 |            1 | /home/mosaic-challenge/ray_results/PPO/PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000_0_2022-05-15_09-54-40/error.txt |
+-------------------------------------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+

2022-05-15 09:54:42,412	ERROR ray_trial_executor.py:102 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 93, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/worker.py", line 1811, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=29183, ip=132.72.112.217, repr=PPOTrainer)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 767, in __init__
    self._env_id: Optional[str] = self._register_if_needed(
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 2810, in _register_if_needed
    raise ValueError(
ValueError: <function <lambda> at 0x7f3fbd2ba310> is an invalid env specification. You can specify a custom env as either a class (e.g., YourEnvCls) or a registered env id (e.g., "your_env").

(PPOTrainer pid=29183) 2022-05-15 09:54:42,407	ERROR worker.py:449 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=29183, ip=132.72.112.217, repr=PPOTrainer)
(PPOTrainer pid=29183)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 767, in __init__
(PPOTrainer pid=29183)     self._env_id: Optional[str] = self._register_if_needed(
(PPOTrainer pid=29183)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 2810, in _register_if_needed
(PPOTrainer pid=29183)     raise ValueError(
(PPOTrainer pid=29183) ValueError: <function <lambda> at 0x7f3fbd2ba310> is an invalid env specification. You can specify a custom env as either a class (e.g., YourEnvCls) or a registered env id (e.g., "your_env").
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    tune.run(
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/tune.py", line 695, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [PPO_<function <lambda> at 0x7f43ecac4550>_e473d_00000])

Maybe try:

    def env_creator(env_config={}):
        return DogEnv2() 


    register_env("my_env", env_creator)

thank you for your reply.
I tried what you offered:

def create_my_env(env_config={}):
    return DogEnv2()

ray.init()
env_creator = lambda config: create_my_env()
register_env("my_env", env_creator)

tune.run(
        "PPO",
        stop={"episode_reward_mean": 200},
        config={
            "env": "my_env",
            "num_workers": 1,
            },
        )

but I’m getting this error now:

2022-05-16 09:18:52,062	INFO trial_runner.py:803 -- starting PPO_my_env_0e406_00000
(PPOTrainer pid=13071) 2022-05-16 09:18:53,848	INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(PPOTrainer pid=13071) 2022-05-16 09:18:54,332	INFO ppo.py:268 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(PPOTrainer pid=13071) 2022-05-16 09:18:54,332	INFO trainer.py:864 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
== Status ==
Current time: 2022-05-16 09:18:56 (running for 00:00:04.88)
Memory usage on this node: 6.5/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/16 CPUs, 0/1 GPUs, 0.0/34.94 GiB heap, 0.0/17.47 GiB objects
Result logdir: /home/mosaic-challenge/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
+------------------------+----------+-------+
| Trial name             | status   | loc   |
|------------------------+----------+-------|
| PPO_my_env_0e406_00000 | RUNNING  |       |
+------------------------+----------+-------+


2022-05-16 09:18:56,831	ERROR trial_runner.py:876 -- Trial PPO_my_env_0e406_00000: Error processing event.
NoneType: None
Result for PPO_my_env_0e406_00000:
  trial_id: 0e406_00000
  
== Status ==
Current time: 2022-05-16 09:18:56 (running for 00:00:04.88)
Memory usage on this node: 6.5/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/1 GPUs, 0.0/34.94 GiB heap, 0.0/17.47 GiB objects
Result logdir: /home/mosaic-challenge/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
+------------------------+----------+-------+
| Trial name             | status   | loc   |
|------------------------+----------+-------|
| PPO_my_env_0e406_00000 | ERROR    |       |
+------------------------+----------+-------+
Number of errored trials: 1
+------------------------+--------------+-----------------------------------------------------------------------------------------------+
| Trial name             |   # failures | error file                                                                                    |
|------------------------+--------------+-----------------------------------------------------------------------------------------------|
| PPO_my_env_0e406_00000 |            1 | /home/mosaic-challenge/ray_results/PPO/PPO_my_env_0e406_00000_0_2022-05-16_09-18-52/error.txt |
+------------------------+--------------+-----------------------------------------------------------------------------------------------+

2022-05-16 09:18:56,833	ERROR ray_trial_executor.py:102 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 93, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/worker.py", line 1811, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=13071, ip=132.72.112.217, repr=PPOTrainer)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 1035, in _init
    raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

ray::PPOTrainer.__init__() (pid=13071, ip=132.72.112.217, repr=PPOTrainer)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 830, in __init__
    super().__init__(
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/trainable.py", line 149, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 911, in setup
    self.workers = WorkerSet(
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 134, in __init__
    remote_spaces = ray.get(
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=13141, ip=132.72.112.217, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fcaf52ecc10>)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 507, in __init__
    check_env(self.env)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 65, in check_env
    check_gym_environments(env)
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 135, in check_gym_environments
    sampled_observation = env.observation_space.sample()
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/gym/spaces/box.py", line 42, in sample
    return self.np_random.uniform(low=self.low, high=high, size=self.shape).astype(self.dtype)
  File "mtrand.pyx", line 1133, in numpy.random.mtrand.RandomState.uniform
OverflowError: Range exceeds valid bounds

(PPOTrainer pid=13071) 2022-05-16 09:18:56,828	ERROR worker.py:449 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=13071, ip=132.72.112.217, repr=PPOTrainer)
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 1035, in _init
(PPOTrainer pid=13071)     raise NotImplementedError
(PPOTrainer pid=13071) NotImplementedError
(PPOTrainer pid=13071) 
(PPOTrainer pid=13071) During handling of the above exception, another exception occurred:
(PPOTrainer pid=13071) 
(PPOTrainer pid=13071) ray::PPOTrainer.__init__() (pid=13071, ip=132.72.112.217, repr=PPOTrainer)
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 830, in __init__
(PPOTrainer pid=13071)     super().__init__(
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/trainable.py", line 149, in __init__
(PPOTrainer pid=13071)     self.setup(copy.deepcopy(self.config))
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 911, in setup
(PPOTrainer pid=13071)     self.workers = WorkerSet(
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 134, in __init__
(PPOTrainer pid=13071)     remote_spaces = ray.get(
(PPOTrainer pid=13071) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=13141, ip=132.72.112.217, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fcaf52ecc10>)
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 507, in __init__
(PPOTrainer pid=13071)     check_env(self.env)
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 65, in check_env
(PPOTrainer pid=13071)     check_gym_environments(env)
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 135, in check_gym_environments
(PPOTrainer pid=13071)     sampled_observation = env.observation_space.sample()
(PPOTrainer pid=13071)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/gym/spaces/box.py", line 42, in sample
(PPOTrainer pid=13071)     return self.np_random.uniform(low=self.low, high=high, size=self.shape).astype(self.dtype)
(PPOTrainer pid=13071)   File "mtrand.pyx", line 1133, in numpy.random.mtrand.RandomState.uniform
(PPOTrainer pid=13071) OverflowError: Range exceeds valid bounds
(RolloutWorker pid=13141) 2022-05-16 09:18:56,824	WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=13141) 2022-05-16 09:18:56,824	WARNING env.py:120 -- Your env doesn't have a .spec.max_episode_steps attribute. This is fine if you have set 'horizon' in your config dictionary, or `soft_horizon`. However, if you haven't, 'horizon' will default to infinity, and your environment will not be reset.
(RolloutWorker pid=13141) 2022-05-16 09:18:56,825	ERROR worker.py:449 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=13141, ip=132.72.112.217, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fcaf52ecc10>)
(RolloutWorker pid=13141)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 507, in __init__
(RolloutWorker pid=13141)     check_env(self.env)
(RolloutWorker pid=13141)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 65, in check_env
(RolloutWorker pid=13141)     check_gym_environments(env)
(RolloutWorker pid=13141)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/rllib/utils/pre_checks/env.py", line 135, in check_gym_environments
(RolloutWorker pid=13141)     sampled_observation = env.observation_space.sample()
(RolloutWorker pid=13141)   File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/gym/spaces/box.py", line 42, in sample
(RolloutWorker pid=13141)     return self.np_random.uniform(low=self.low, high=high, size=self.shape).astype(self.dtype)
(RolloutWorker pid=13141)   File "mtrand.pyx", line 1133, in numpy.random.mtrand.RandomState.uniform
(RolloutWorker pid=13141) OverflowError: Range exceeds valid bounds
Traceback (most recent call last):
  File "main.py", line 26, in <module>
    tune.run(
  File "/home/mosaic-challenge/anaconda3/envs/python3.8/lib/python3.8/site-packages/ray/tune/tune.py", line 695, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [PPO_my_env_0e406_00000])

any suggestions?

Did you find any solution? I have the same problem.
Registering costum environments that take env_config is a common problem!

@fardinabbasi : Keep in mind that the class of your custom environment, inheriting from gym.Env., must also be “ready” to take an env_config dictionary and parse it accordingly to attributes. See below a code snippet developed by me.

def assign_env_config(self, args, kwargs):
    """Configure instance based on args and keyword args."""
    # In order to ensure compatbility to multiple RL libraries, this method is supposed
    # to flexible treat the input as "args" or as "kwargs"

    # First path: Parameters are available as a dictionary, which is delivered under the keyword "env_config".
    # This case is occurring while gym.make().
    if kwargs is not None:
        for key, value in kwargs.items():
            setattr(self, key, value)
        if hasattr(self, "env_config"):
            for key, value in self.env_config.items():
                # Check types based on default settings
                if hasattr(self, key):
                    if type(getattr(self, key)) == np.ndarray:
                        setattr(self, key, value)
                    else:
                        setattr(self, key, type(getattr(self, key))(value))
                else:
                    raise AttributeError(f"{self} has no attribute {key}")

    # Second path: "env_config" is passed as flattened dictionary, which is part of a tuple.
    # This case is occurring in e.g. ray rllib.
    # While ray provides EnvContext to capture that properly, we want to avoid dependency on ray.
    if args is not None:
        for i in range(len(args)):
            args_item = args[i]
            for key, value in args_item.items():
                # Check types based on default settings
                if hasattr(self, key):
                    if type(getattr(self, key)) == np.ndarray:
                        setattr(self, key, value)
                    else:
                        setattr(self, key, type(getattr(self, key))(value))
                else:
                    raise AttributeError(f"{self} has no attribute {key}")

@PhilippWillms :
Thank you, Philipp, for your response, but I’m not entirely sure I understand what you mean by being “ready” to take an env_config.
This is my custom environment which only takes a env_config, but when I try to run it with ray tune I get Cannot create PPOConfig from given config_dict! Property __stdout_file__ not supported.
Here is the environment:

Hi @fardinabbasi,

This is where your error is coming from
return self._get_obs() if not terminated else None

You cannot return None. You will need to come up with some dummy observation to return.

Hi @mannyv,

You’re correct; I’ve made the change to self._get_obs() if not terminated else self.observation_space.sample(). However, I’m still encountering the error message: Cannot create PPOConfig from the given config_dict! Property stdout_file is not supported.

I’m new to RLlib, and I’m uncertain whether this issue is due to not registering env and pass RankingEnv directly as train_config["env"] = RankingEnv, or if it’s related to how I retrieve my configuration within RankingEnv.

1 Like