Memory Leak in wrapper or callback?

aadharna · July 18, 2023, 3:13pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

When I run the following code for selfplay training, everything works and the experiment runs to completion.

gist.github.com

https://gist.github.com/aadharna/f4d5d579d0ac76c9742bffafa11751b4

leaking_c4_shaped_selfplay.py

"""Example showing how one can implement a simple self-play training workflow.

Uses the open spiel adapter of RLlib with the "connect_four" game and
a multi-agent setup with a "main" policy and n "main_v[x]" policies
(x=version number), which are all at-some-point-frozen copies of
"main". At the very beginning, "main" plays against RandomPolicy.

Checks for the training progress after each training update via a custom
callback. We simply measure the win rate of "main" vs the opponent
("main_v[x]" or RandomPolicy at the beginning) by looking through the

This file has been truncated. show original

working_c4_shaped_selfplay.py

"""Example showing how one can implement a simple self-play training workflow.

Uses the open spiel adapter of RLlib with the "connect_four" game and
a multi-agent setup with a "main" policy and n "main_v[x]" policies
(x=version number), which are all at-some-point-frozen copies of
"main". At the very beginning, "main" plays against RandomPolicy.

Checks for the training progress after each training update via a custom
callback. We simply measure the win rate of "main" vs the opponent
("main_v[x]" or RandomPolicy at the beginning) by looking through the

This file has been truncated. show original

working_novelty_reward_ppo.py

import logging
import numpy as np
from collections import OrderedDict
from typing import List, Optional, Type, Union
import math
import time

from ray.util.debug import log_once
from ray.rllib.policy.sample_batch import SampleBatch, MultiAgentBatch
from ray.rllib.evaluation.postprocessing import compute_gae_for_sample_batch

This file has been truncated. show original

For the next step in my science, I implemented a wrapper that changes the game dynamics by adding in dominant strategies:

The new code is just this:

class DSOpenSpielEnv(OpenSpielEnv):
    def __init__(self, spiel_env, env_config):
        super().__init__(spiel_env)
        self._skip_env_checking = True
        self.env_config = env_config
        self.dominant_strategy = env_config.get('dominant_strategy', [1, 2])
        self.dominant_active_for_agent = 0
        self.action_memory = []

        # should already exist because of super class, but just in case
        self.observation_space = gymnasium.spaces.Box(
            float("-inf"), float("inf"), (self.env.observation_tensor_size(),)
        )
        self.action_space = gymnasium.spaces.Discrete(self.env.num_distinct_actions())

    def set_dominant_active_for_agent(self, agent_id):
        self.dominant_active_for_agent = agent_id

    def reset(self, *, seed=None, options=None):
        obs, _ = super().reset(seed=seed, options=options)
        self.dominant_strategy_index = 0
        self.action_memory = []
        return obs, _

    def step(self, action):
        # call super step
        check_dom = False
        curr_player = self.state.current_player()
        if curr_player == self.dominant_active_for_agent:
            self.action_memory.append(action[curr_player])
            if len(self.action_memory) > len(self.dominant_strategy):
                _ = self.action_memory.pop(0)
            check_dom = True

        obs, rewards, term, truc, info = super().step(action)

        # check to see if the dominant strategy has been triggered by the 'main' agent
        if check_dom:
            if len(self.action_memory) == len(self.dominant_strategy) and all([self.action_memory[i] == self.dominant_strategy[i]
                                                                               for i in range(len(self.action_memory))]):
                rewards = {ag: r for ag, r in enumerate(self.state.returns())}
                if self.dominant_active_for_agent == 0:
                    rewards[0] = 1
                    rewards[1] = -1
                else:
                    rewards[0] = -1
                    rewards[1] = 1
                term = {a: True for a in [0, 1, "__all__"]}
                truc = {a: True for a in [0, 1, "__all__"]}
                obs = {}

        return obs, rewards, term, truc, info

class SetDSCallback(DefaultCallbacks):
    def __init__(self):
        super().__init__()

    def on_episode_start(self, *, worker, base_env, policies, episode, env_index, **kwargs):

        envs = self._get_envs(base_env)

        # need to run the policy mapping fn callback to set the dominant agent
        policy_id = episode.policy_mapping_fn(0, episode, episode.worker)
        if policy_id == "main":
            dominant_agent = 0
        else:
            dominant_agent = 1

        envs[env_index].set_dominant_active_for_agent(dominant_agent)

    def _get_envs(self, base_env):
        if isinstance(base_env, VectorEnvWrapper):
            return base_env.vector_env.get_sub_environments()
        else:
            return base_env.envs

And now there seems to be a memory leak.

So, either it’s my wrapper/callback or the MultiCallback is somehow leaking and I cannot figure out which.

Any thoughts on what’s happening or about how to test where this might be coming from would be greatly appreciated.

Here’s a graph of the RAM over a failed training run with the additional code that crashed due to running out of RAM:

Here’s a graph of the RAM without the additional code:

aadharna · July 19, 2023, 4:47pm

I did some googling around and one potential answer I found was that the environment is adding training data faster than the training process can consume the data which is why memory just keeps going up.

But that doesn’t explain why the OpenSpielEnv doesn’t do this when not wrapped.

So, just don’t know. But I’m running a test of this where I only have one rollout worker with one environment in that worker and we’ll see what happens.

kourosh · July 20, 2023, 4:00pm

Hi @aadharna,

I would start commenting out code to see when the leak goes away. On a single worker machine you can also use tools like memray to help you identify the location of memory leak.

aadharna · July 20, 2023, 6:30pm

I currently have a run going where the MemoryTrackingCallbacks callback is active. Although this will take a long time, I hope it will show us what is taking all the memory. So far, I’m not pulling either of my two main files (the callback/runner file or my modified PPO file) and the memory tracing callback is only showing me internal ray and numpy objects.

If that’s still inconclusive, then I’ll run it again using the memray tool you linked me to.

The single rolloutworker / single environment run I ran just crashed a few minutes ago, but it got about 3/4ths of the way through a full experiment (albeit was very slow). Interestingly, there’s a difference between the System Memory Utilization graphs and the perf/ram_util_percent graph where the ram_util_percent graph is not showing that final spike.

Topic		Replies	Views
Self-play modifications via callbacks RLlib	4	513	February 24, 2023
Rllib multi agent connect 4 issues - why does it 'forget' what it learnt? RLlib	0	245	November 27, 2023
Memory Leak when training PPO on a single agent environment RLlib	15	1636	December 24, 2022
RLlib self play with league example stops learning after first generation RLlib	2	224	February 11, 2024
Question regarding example self-play implementation RLlib	3	407	February 16, 2024

Memory Leak in wrapper or callback?

Related topics