Logging stuff in a custom gym environment using RLlib and Tune

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I want to log information that is collected by the environment. Say the environment consists of object 1, object 2 and object 3 and then the episode is over. Then I want to have a function that has all these three objects as input, combines them in some way and store the result in the correct folder (there are more runs so storing in the correct folder is not trivial as each environment/run has its own folder and needs its own logging).

There are a number of approaches I considered but with all of them I have significant confusion of how to implement them:

  • Having access to trial.logdir while executing the env.step() function or env.render() function. Then I can store the correct objects in the correct folder. But I am not sure how I get access to that.
  • Use class CustomLoggerCallback(LoggerCallback): and override the log_trial_on_result. In this function I have access to trial.logdir but then I only have access to the result (dict) and this doesn’t include the observation or info from the environment.
  • I saw this post: How to log Render to tensorboard? which does have access to the environment (and I guess to trial.logdir) but inheriting from DefaultCallbacks seems to give the error that setup is not implemented.

Let me know if I can make any of these approaches work or maybe a different approach. From the approaches I mentioned the second seems to be the easiest if I can get access to the environment observation or info dict in some way.

3 Likes

Hey @RaymondK , could you implement a custom callback class, subclassed from ray.rllib.agents.callbacks.DefaultCallbacks and then add it to your config via config["callbacks"] = YourNewClass?

In there, you override the on_episode_start(), on_episode_step(), etc… methods, where you have access to the worker (worker arg), its config (worker.policy_config), which’ll give you the worker’s index (worker.policy_config["worker_index"]) and whether you are on an evaluation worker or not (worker.policy_config["in_evaluation"]). Then you can write to disk however you would like this rendering to happen (e.g. start a new file at beginning of episode, then write to it on each step, and close the file in on_episode_end()).

1 Like

To get the directory in the worker, you can use in your callbacks:

worker.io_context.log_dir
1 Like

@RaymondK You can do something like this:

from typing import Dict, Optional
from ray.rllib.agents.callbacks import DefaultCallbacks
from ray.rllib.env import BaseEnv
from ray.rllib.evaluation import Episode
from ray.rllib.policy import Policy
from ray.rllib.utils.typing import PolicyID

from ray.rllib.agents.pg import PGConfig

from pathlib import Path
import imageio
import numpy as np


class MyCallback(DefaultCallbacks):

    def on_episode_start(
        self,
        *,
        worker: "RolloutWorker",
        base_env: BaseEnv,
        policies: Dict[PolicyID, Policy],
        episode: Episode,
        **kwargs,
    ) -> None:
        # create an empty list for keeping the frames
        episode.hist_data["frames"] = []

    def on_episode_step(
        self,
        *,
        worker: "RolloutWorker",
        base_env: BaseEnv,
        policies: Optional[Dict[PolicyID, Policy]] = None,
        episode: Episode,
        **kwargs,
    ) -> None:

        if worker.policy_config["in_evaluation"]:
            # call your custom env logging procedure here
            img = np.random.randint(0, 256, size=(64, 64, 3)).astype('uint8')
            episode.hist_data["frames"].append(img)

    def on_episode_end(
        self,
        *,
        worker: "RolloutWorker",
        base_env: BaseEnv,
        policies: Dict[PolicyID, Policy],
        episode: Episode,
        **kwargs,
    ) -> None:

        if worker.policy_config["in_evaluation"]:
            log_dir = Path(worker.io_context.log_dir)
            frames = episode.hist_data["frames"]
            imageio.mimsave(
                log_dir / f'{episode.env_id}_{episode.env_id}.gif',
                frames
            )


if __name__ == '__main__':
    config = (
        PGConfig()
        .framework('torch')
        .callbacks(callbacks_class=MyCallback)
        .environment(env='CartPole-v0')
        .evaluation(evaluation_interval=1)
    )

    algo = config.build()
    algo.train()

Thanks for the quick answers, really appreciate it. The method of @sven1977 works perfectly!

1 Like