[RLlib] Visualise custom environment

I’m trying to record the observations from a custom env.
I implemented the render method for my environment that just returns an RGB array.

If I set monitor: True then Gym complains that: WARN: Trying to monitor an environment which has no 'spec' set. This usually means you did not create it via 'gym.make', and is recommended only for advanced users.

So using the workflow to first register my env as a Gym env and then as an RLlib env works:

gym_register(id="env-v0", entry_point="EnvClass")

def env_creator(env_name, config):
    env = gym.make(env_name)
    return env

rllib_register("env-v0", lambda config: env_creator("env-v0", config))

However, there’s still nothing recorded. Also, I don’t know how to pass a config to gym.make, but that’s a separate issue.

There’s also the render_env and record_env parameters in the config, but if I set them to true, I get:

 File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 106, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 465, in __init__
    super().__init__(config, logger_creator)
  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/tune/trainable.py", line 96, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 577, in setup
    self.config = Trainer.merge_trainer_configs(self._default_config,
  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 1055, in merge_trainer_configs
    return deep_update(config1, config2, _allow_unknown_configs,
  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/tune/utils/util.py", line 220, in deep_update
    raise Exception("Unknown config parameter `{}` ".format(k))
Exception: Unknown config parameter `render_env`

So what’s the right workflow to have the rendered environment recorded?
Thanks

1 Like

I think the Unknown config parameter render_env is a version/git-clone mismatch. I just merged support for this config key into the master branch and maybe you are still pointing to an older ray install?
Can you try installing the freshest master?

I do see there is a problem with record_env. When setting this key to anything non False, you need to provide the directory (str) in which you want to store the videos. E.g. record_env: "/Users/xyz/my_videos/".

That’s great, thanks!
That’s correct, from the latest commit the render_env option does work.

So just to clarify some details:

  1. regarding record_env, if I want to record it to the trial directory (along with the checkpoints), then how do I set it?
  2. how does this interact with the output option, that allows "logdir"? I would assume they are related.
  3. what’s the purpose of the monitor option then?
  4. does RLlib rely on Gym’s monitor mechanics, or can I avoid it? I.e. do I need to go through the Gym registering workflow, or can I just use directly the environment without gym.make?
  5. in this issue you mentioned that there’s a video-dir option that’s required. Is that still the case?

Thanks!

Ah, yes, this is confusing. :confused: You can use either one.
Try setting “monitor”=True. This should put the vidoe files into the experiment’s (~/ray_results) folder.

Yes, this only works with gym envs. They are then auto-wrapped via:

from gym import wrappers
env = wrappers.Monitor(env, [monitor_path, i.e. ~/ray_results/...])

Hi! I have a similar problem. I am using ray==1.2.0, created a custom environment which returns rgb image arrays and set monitor: True, however no videos are saved in the ray_results folder…:thinking:

I don’t get any errors actually in the console… Here is my code:

import gym
import gym_jsbsim
from gym_jsbsim.environment import GuidanceEnv
import ray
from ray.rllib.agents.ddpg import td3, TD3Trainer
from ray.tune import register_env
from ray.tune.logger import pretty_print
from gym_jsbsim.normalise_env import NormalizeStateEnv
from ray import tune


def env_creator(env_config):
    return GuidanceEnv(jsbsim_path="/Users/walter/thesis_project/jsbsim", max_episode_time_s=60 * 5)

register_env("guidance-v0", env_creator)


def my_train_fn(config, reporter):
    # agent.restore('/content/drive/MyDrive/checkpoints/checkpoint_701/checkpoint-701')
    agent = TD3Trainer(config=config, env="guidance-v0")
    for i in range(10):
        result = agent.train()

        if i % 1 == 0:
            checkpoint = agent.save(checkpoint_dir="./data/checkpoints")
            print(pretty_print(result))
            print("checkpoint saved at", checkpoint)
    agent.stop()


if __name__ == "__main__":
    ray.init()

    config = {
        "lr": 0.01, # tune.grid_search([0.01, 0.001, 0.0001]),
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": 0,
        "num_workers": 1,
        "framework": "tf",
        "eager_tracing": False,
        "monitor": True
    }
    resources = TD3Trainer.default_resource_request(config).to_json()
    tune.run(my_train_fn, resources_per_trial=resources, config=config)

Thanks for your help!

I think this can happen if you don’t have the proper mp4 recorder installed on your machine. I remember having this problem once on my Mac laptop.

Thanks for the response.
So I was just trying to make it work with the SimpleCorridor example:

import argparse
import os
from datetime import datetime

import gym
import numpy as np
import ray
from gym.spaces import Box, Discrete
from ray import tune
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.utils.framework import try_import_torch
from ray.rllib.utils.test_utils import check_learning_achieved
from ray.tune import grid_search

torch, nn = try_import_torch()

parser = argparse.ArgumentParser()
parser.add_argument("--run", type=str, default="PPO")
parser.add_argument("--as-test", action="store_true")
parser.add_argument("--stop-iters", type=int, default=50)
parser.add_argument("--stop-timesteps", type=int, default=100000)
parser.add_argument("--stop-reward", type=float, default=0.1)


class SimpleCorridor(gym.Env):
    """Example of a custom env in which you have to walk down a corridor.

    You can configure the length of the corridor via the env config."""
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ), dtype=np.float32)

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        assert action in [0, 1], action
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1.0 if done else -0.1, done, {}

    def render(self, mode="human"):
        img = np.random.rand(64, 64)
        return img


class CustomModel(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""
    def __init__(self, obs_space, action_space, num_outputs, model_config, name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs, model_config,
                              name)
        nn.Module.__init__(self)

        self.torch_sub_model = TorchFC(obs_space, action_space, num_outputs, model_config,
                                       name)

    def forward(self, input_dict, state, seq_lens):
        input_dict["obs"] = input_dict["obs"].float()
        fc_out, _ = self.torch_sub_model(input_dict, state, seq_lens)
        return fc_out, []

    def value_function(self):
        return torch.reshape(self.torch_sub_model.value_function(), [-1])


def exp_name(prefix):
    return prefix + '.' + datetime.now().strftime("%Y-%m-%d.%H:%M:%S")


if __name__ == "__main__":
    args = parser.parse_args()
    ray.init()

    # Can also register the env creator function explicitly with:
    # register_env("corridor", lambda config: SimpleCorridor(config))
    ModelCatalog.register_custom_model("my_model", CustomModel)

    config = {
        "env": SimpleCorridor,  # or "corridor" if registered above
        "env_config": {
            "corridor_length": 5,
        },
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
        "model": {
            "custom_model": "my_model",
            "vf_share_layers": True,
        },
        "lr": grid_search([1e-2, 1e-4, 1e-6]),  # try different lrs
        "num_workers": 1,  # parallelism
        "framework": "torch",
        "monitor": True,
        "render_env": True,
    }

    stop = {
        "training_iteration": args.stop_iters,
        "timesteps_total": args.stop_timesteps,
        "episode_reward_mean": args.stop_reward,
    }

    name = exp_name(args.run)
    results = tune.run(args.run, config=config, stop=stop, name=name)

    if args.as_test:
        check_learning_achieved(results, args.stop_reward)
    ray.shutdown()

This creates the following in the trial directory:

$ ls ~/ray_results/PPO.2021-02-25.12:10:37/PPO_SimpleCorridor_78297_00000_0_lr=0.01_2021-02-25_12-10-37
events.out.tfevents.1614255037.wombat  openaigym.episode_batch.0.367825.stats.json  openaigym.manifest.0.367825.manifest.json  params.json  params.pkl  progress.csv  result.json

So where can I find the rendered video or am I doing this wrong?
Thanks!

I think, this will only create video files if you are using an image based gym environment (like Atari).

Of course, but I added a dummy render method to the class:

    def render(self, mode="human"):
        img = np.random.rand(64, 64)
        return img

Otherwise there should be no other difference between an ‘image based’ environment and any arbitrary environment, right?
The method is called during the training, but it doesn’t save anything from that.

Got it, I missed that. Could you try setting the video_dir config key (instead of setting monitor=True) to some directory (str) on your machine?

Thanks for the suggestion.
So now the config is:

    config = {
        "env": SimpleCorridor,  # or "corridor" if registered above
        "evaluation_interval": 2,
        "env_config": {
            "corridor_length": 5,
        },
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
        "model": {
            "custom_model": "my_model",
            "vf_share_layers": True,
        },
        "lr": grid_search([1e-2, 1e-4, 1e-6]),  # try different lrs
        "num_workers": 1,  # parallelism
        "framework": "torch",
        # "monitor": True,
        "video_dir": '~/video',
        "render_env": True,
    }

And it give me:

  File "/home/user/miniconda3/envs/sym-rl/lib/python3.8/site-packages/ray/tune/utils/util.py", line 239, in deep_update
    raise Exception("Unknown config parameter `{}` ".format(k))
Exception: Unknown config parameter `video_dir`

Sorry, I meant “record_env”: [path to directory, where vids should be stored].
I got confused with the command line arg for the rllib rollout ...script.

Ah, I see. Ok, so now with 'record_env': 'videos' it creates a videos folder in the project dir, and it has the following:

$ ls videos 
openaigym.episode_batch.0.2928030.stats.json  openaigym.episode_batch.0.2928055.stats.json  openaigym.manifest.0.2928052.manifest.json
openaigym.episode_batch.0.2928052.stats.json  openaigym.manifest.0.2928030.manifest.json    openaigym.manifest.0.2928055.manifest.json

So how do I get the videos from those json files?

I my case I saw the same files, but like you no video files. :thinking:

If your model input is pixel data, this may be of use: ray/logger.py at 0452a3a435e023eada85f670e70ffef02ceb5943 · ray-project/ray · GitHub

It looks like there is tensorboard support for videos in tune. Then you could visualize the environment along with the custom data. Using callbacks one could do something like:

from ray.rllib.agents.callbacks import DefaultCallbacks
class TBVideo(DefaultCallbacks):
  def on_train_result(self, *, trainer, result, **kwargs) -> None:
    result['custom_metrics'].update({'my_video': my_model.my_input_images.detach().numpy()})

I have not tested this. Perhaps @sven1977 could say if this is a good idea or not.

@smorad thanks for the suggestion, TB logging might be a good alternative, but I would also just want to have the videos separately so it’s easy to share it with other people.

Just to add to the previous comment, if I load openaigym.manifest.0.2928030.manifest.json then it has a videos field, that’s empty. I.e.:

In [3]: mf = json.load(open('openaigym.manifest.0.2928030.manifest.json'))

In [4]: mf.keys()
Out[4]: dict_keys(['stats', 'videos', 'env_info'])

In [5]: mf['videos']
Out[5]: []

So could we have some clarification on how to generate videos from (visual) environment?

Okey, so I tracked down the issue. To make the monitor work, add metadata = {'render.modes': ['rgb_array']} to to environment. I.e.:

class SimpleCorridor(gym.Env):
    """Example of a custom env in which you have to walk down a corridor.

    You can configure the length of the corridor via the env config."""
    metadata = {'render.modes': ['human', 'rgb_array']}

    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ), dtype=np.float32)

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        assert action in [0, 1], action
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1.0 if done else -0.1, done, {}

    def render(self, mode="human"):
        img = np.random.randint(0, 256, size=(64, 64, 3), dtype=np.uint8)
        return img

Note that the rendered numpy array also needs to be HxWx3 uint8 to work. The key is to include rgb_array, other render modes can be added as well.

In addition: there’s a bug in gym, so the produced mp4 files will be corrupted. See this issue (which has a workaround), and the corresponding PR.

In addition 2: there’s another bug in gym, it uses 2 separate directories in the monitor wrapper. See this issue, and the corresponding PR.

2 Likes

Hey @vakker00 @Lauritowal , there is also a fix now in master that should solve these problems:

PR (already merged some time ago): [RLlib] Fix env rendering and recording options (for non-local mode; >0 workers; +evaluation-workers). by sven1977 · Pull Request #14796 · ray-project/ray · GitHub

This PR addresses the following problem:
RLlib's env rendering and video recording options are currently buggy.

Allows custom gym.Envs to be rendered in an automatic window by simply returning a np.array RGB-image in the render() method.
Alternatively, custom Envs can take care of their own rendering mechanism via their own window handling.
Fixes video recording for non-local mode and num_workers > 0.
Adds an example script that shows how to use both options in a simple corridor env.
Soft-obsoletes "monitor" config option for clarity.
Works also for evaluation-only (via evaluation config as shown in the new example script).
IMPORTANT NOTE:
A recent bug in openAI gym prevents RLlib's "record_env" option from recording videos properly. Instead, the produced mp4 files have a size of 1kb and are corrupted. A simple fix for this is described here:
openai/gym#1925
2 Likes