Help debugging a memory leak in rllib

I’m trying to debug a very slow memory leak in rllib that occurs when i am using IMPALA + multi-agent.

I cannot find any leak using tools like tracemalloc so I dont think the memory issue is in python.
ray memory also does not show anty obvious leakage at all.

All of the workers very slowly (over about 8-12 hours) accumulate memory (about 90GB) to the point where processes fail.

The interesting thing about this problem is that I can only reproduce it on a server that is set up with linux cgroups limiting memory. If i run it on my machine at home (with only 16GB of RAM) the leak disappears and happily runs at around 7-10GB.

I’m not looking for a “solution” to this problem, just help in how I can find this leak. Are there any tools or methods that can show me what ray workers are putting on the heap (or even off-heap)? As I said, tracemalloc does not show any leak (other than in function_manager.py… but this is only ~ 100K after several hours rather than 80GB).

Does ray manage GC itself for worker processes? I’m thinking it’s not taking into account cgroup limits when its managing it’s own internal cleanups (so it thinks there is more memory available than there is).

Any help would be appreciated!

I’ve taken the example from another ticket with the same problem: [rllib] Memory leak in environment worker in multi-agent setup · Issue #9964 · ray-project/ray · GitHub and expanded on it to give more logs using tracemalloc etc… These logs go into tensorflow (or wandb if you uncomment the wandb lines)

Here is minimal reproduction:

import argparse
import os
import tracemalloc
from email.policy import Policy
from typing import Optional, Dict

import numpy as np
import psutil
import ray
from gym.spaces import Box
from ray import tune
from ray.rllib import BaseEnv
from ray.rllib.agents.callbacks import DefaultCallbacks
from ray.rllib.agents.impala import ImpalaTrainer
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from ray.rllib.evaluation import MultiAgentEpisode
from ray.rllib.utils.typing import PolicyID
from ray.tune.integration.wandb import WandbLoggerCallback
from ray.tune.registry import register_env


class TraceMallocCallback(DefaultCallbacks):

    def __init__(self):
        super().__init__()

        tracemalloc.start(10)

    def on_episode_end(self, *, worker: "RolloutWorker", base_env: BaseEnv, policies: Dict[PolicyID, Policy],
                       episode: MultiAgentEpisode, env_index: Optional[int] = None, **kwargs) -> None:
        snapshot = tracemalloc.take_snapshot()
        top_stats = snapshot.statistics('lineno')

        for stat in top_stats[:5]:
            count = stat.count
            size = stat.size

            trace = str(stat.traceback)

            episode.custom_metrics[f'tracemalloc/{trace}/size'] = size
            episode.custom_metrics[f'tracemalloc/{trace}/count'] = count

        process = psutil.Process(os.getpid())
        worker_rss = process.memory_info().rss
        worker_data = process.memory_info().data
        worker_vms = process.memory_info().vms
        episode.custom_metrics[f'tracemalloc/worker/rss'] = worker_rss
        episode.custom_metrics[f'tracemalloc/worker/data'] = worker_data
        episode.custom_metrics[f'tracemalloc/worker/vms'] = worker_vms


def dim_to_gym_box(dim, val=np.inf):
    """Create gym.Box with specified dimension."""
    high = np.full((dim,), fill_value=val)
    return Box(low=-high, high=high)


class DummyMultiAgentEnv(MultiAgentEnv):
    """Return zero observations."""

    def __init__(self, config):
        del config  # Unused
        super(DummyMultiAgentEnv, self).__init__()
        self.config = dict(act_dim=17, obs_dim=380, n_players=2, n_steps=1000)
        self.players = ["player_%d" % p for p in range(self.config['n_players'])]
        self.current_step = 0

    def _obs(self):
        return np.zeros((self.config['obs_dim'],))

    def reset(self):
        self.current_step = 0
        return {p: self._obs() for p in self.players}

    def step(self, action_dict):
        done = self.current_step >= self.config['n_steps']
        self.current_step += 1

        obs = {p: self._obs() for p in self.players}
        rew = {p: np.random.random() for p in self.players}
        dones = {p: done for p in self.players + ["__all__"]}
        infos = {p: {'test_thing': 'wahoo'} for p in self.players}

        return obs, rew, dones, infos

    @property
    def observation_space(self):
        return dim_to_gym_box(self.config['obs_dim'])

    @property
    def action_space(self):
        return dim_to_gym_box(self.config['act_dim'])


def create_env(config):
    """Create the dummy environment."""
    return DummyMultiAgentEnv(config)


env_name = "DummyMultiAgentEnv"
register_env(env_name, create_env)


def get_trainer_config(env_config, train_policies, num_workers=5, framework="torch"):
    """Build configuration for 1 run."""

    # trainer config
    config = {
        "env": env_name, "env_config": env_config, "num_workers": num_workers,
        # "multiagent": {"policy_mapping_fn": lambda x: x, "policies": policies,
        #               "policies_to_train": train_policies},
        "framework": framework,
        "train_batch_size": 8192,

        'batch_mode': 'truncate_episodes',

        "callbacks": TraceMallocCallback,
        "lr": 0.0,
    }
    return config


def tune_run():
    parser = argparse.ArgumentParser(description='Run experiments')

    parser.add_argument('--debug', action='store_true', help='Debug mode')
    parser.add_argument('--yaml-file', help='YAML file containing GDY for the game')
    parser.add_argument('--root-directory', default=os.path.expanduser("~/ray_results"))

    args = parser.parse_args()

    #wandbLoggerCallback = WandbLoggerCallback(
    #    project='ma_mem_leak_exp',
    #    api_key_file='~/.wandb_rc',
    #    dir=args.root_directory
    #)

    ray.init(ignore_reinit_error=True, num_gpus=1, include_dashboard=False)
    config = get_trainer_config(train_policies=['player_1', 'player_2'], env_config={})
    return tune.run(ImpalaTrainer,
                    config=config,
                    name="dummy_run",
                    local_dir=args.root_directory)
                    #callbacks=[wandbLoggerCallback])


if __name__ == '__main__':
    tune_run()

You will see the memory ladder like this indefinitely (until crash):

There’s no obviously leaking python objects from from the tracemalloc logging, but the memory usage keeps increasing.

I’ve tried doing gc.collect() every few episodes per worker which doesnt help.

1 Like

Hey @Bam4d, thanks for the reproduction script! I’ll take a look.

I can’t reproduce this on my Mac. Memory consumption seems very stable. The only change I did was to take out the GPU (num_gpus=0).
I can try again on a GPU machine.

Like i said, its very hard to reproduce, It only seems to happen when on linux and cgroups are enabled. You might have to spin up a docker image (which i think uses cgroups) to reproduce. The issue is that most HPC services use cgroups for resource allocation, so running this on servers/docker will be a problem for many people.

I’ve done some more testing and I can see that possibly the “simple list collector” might be problematic.

These lines simple_list_collector.py:488:

    # Make sure our mappings are up to date.
    agent_key = (episode.episode_id, agent_id)
    self.agent_key_to_policy_id[agent_key] = policy_id

Is the episode ID unique to all episodes? That would mean that this “agent_key_to_policy_id” map would grow forever right?

Created a branch to test this theory here: GitHub - Bam4d/ray at ma_memory_leak

1 Like

Awesome, could you PR this when confirmed?

yeah just in the process of trying to confirm if this is infact the problem

I’ve also learned that python garbage collections does not actually “see” the limits set by cgroups, so it might just be that python thinks there is alot more memory than there is, and just keeps growing the objects. This might be why its not reproducable in linux (with no cgroups) and mac… either way… I’ll run some more tests and see if mr. leak is gone.

Hi @Bam4d, it seems like things are mostly wrapped up here, but in the future, you can try running ray memory --help to unlock some more helpful features. It seems like only object spilling is turned on by default. You can check out the corresponding docs here.

Either way, we are not cleaning up that dict ever, so that’s definitely a great catch by you, even if it’s just a small leak (leaking strings)!

@Bam4d let me know, what else you find. I prepped this PR here it’s good to go, but feel free to do your own PR and ping me here for merging it.

Small leaks over 1B timesteps end up being big leaks :wink:

Haha, yeah, absolutely!

@Bam4d,

Thank you for taking the time to get your memory tracking callback into the RLlib repo. I had a huge memory leak in my environment that I would have spent forever tracking down without your callback.

Here is a before and after:

More than a year has passed, has this problem been solved? My colleague has also encountered this trouble recently, is there a feasible solution?

Hi @earneet,

Several memory leaks have been found and fixed since that post. Do you have a sense of whether the memory leak is coming from rllib or the environment. We have seen both kinds of memory leaks.

There is an rllib callback you can enable to help find memory leaks. More information about that here: How To Contribute to RLlib — Ray 3.0.0.dev0

thank you for your reply. I found the problem after tracing, it is not a memory leak of ray, it is just that the speed of the learner consuming data is too slow, which leads to the accumulation of sampled data.

1 Like

@earneet What trace did you do to figure out that the speed of the learner is the problem. I’m having a similar issue right now and am trying to check if this is the case as well.

Hello, how did you fix this? Please advise.

i am not very sure. I observed the log and memory curve, when the sample report the memory will skyrocket. Then I increased the number of GPUs and worker_num, then it works.