Memory Leak in wrapper or callback?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When I run the following code for selfplay training, everything works and the experiment runs to completion.

For the next step in my science, I implemented a wrapper that changes the game dynamics by adding in dominant strategies:

The new code is just this:

class DSOpenSpielEnv(OpenSpielEnv):
    def __init__(self, spiel_env, env_config):
        super().__init__(spiel_env)
        self._skip_env_checking = True
        self.env_config = env_config
        self.dominant_strategy = env_config.get('dominant_strategy', [1, 2])
        self.dominant_active_for_agent = 0
        self.action_memory = []

        # should already exist because of super class, but just in case
        self.observation_space = gymnasium.spaces.Box(
            float("-inf"), float("inf"), (self.env.observation_tensor_size(),)
        )
        self.action_space = gymnasium.spaces.Discrete(self.env.num_distinct_actions())

    def set_dominant_active_for_agent(self, agent_id):
        self.dominant_active_for_agent = agent_id

    def reset(self, *, seed=None, options=None):
        obs, _ = super().reset(seed=seed, options=options)
        self.dominant_strategy_index = 0
        self.action_memory = []
        return obs, _

    def step(self, action):
        # call super step
        check_dom = False
        curr_player = self.state.current_player()
        if curr_player == self.dominant_active_for_agent:
            self.action_memory.append(action[curr_player])
            if len(self.action_memory) > len(self.dominant_strategy):
                _ = self.action_memory.pop(0)
            check_dom = True

        obs, rewards, term, truc, info = super().step(action)

        # check to see if the dominant strategy has been triggered by the 'main' agent
        if check_dom:
            if len(self.action_memory) == len(self.dominant_strategy) and all([self.action_memory[i] == self.dominant_strategy[i]
                                                                               for i in range(len(self.action_memory))]):
                rewards = {ag: r for ag, r in enumerate(self.state.returns())}
                if self.dominant_active_for_agent == 0:
                    rewards[0] = 1
                    rewards[1] = -1
                else:
                    rewards[0] = -1
                    rewards[1] = 1
                term = {a: True for a in [0, 1, "__all__"]}
                truc = {a: True for a in [0, 1, "__all__"]}
                obs = {}

        return obs, rewards, term, truc, info
class SetDSCallback(DefaultCallbacks):
    def __init__(self):
        super().__init__()

    def on_episode_start(self, *, worker, base_env, policies, episode, env_index, **kwargs):

        envs = self._get_envs(base_env)

        # need to run the policy mapping fn callback to set the dominant agent
        policy_id = episode.policy_mapping_fn(0, episode, episode.worker)
        if policy_id == "main":
            dominant_agent = 0
        else:
            dominant_agent = 1

        envs[env_index].set_dominant_active_for_agent(dominant_agent)

    def _get_envs(self, base_env):
        if isinstance(base_env, VectorEnvWrapper):
            return base_env.vector_env.get_sub_environments()
        else:
            return base_env.envs

And now there seems to be a memory leak.

So, either it’s my wrapper/callback or the MultiCallback is somehow leaking and I cannot figure out which.

Any thoughts on what’s happening or about how to test where this might be coming from would be greatly appreciated.


Here’s a graph of the RAM over a failed training run with the additional code that crashed due to running out of RAM:

Here’s a graph of the RAM without the additional code:

I did some googling around and one potential answer I found was that the environment is adding training data faster than the training process can consume the data which is why memory just keeps going up.

But that doesn’t explain why the OpenSpielEnv doesn’t do this when not wrapped.

So, just don’t know. But I’m running a test of this where I only have one rollout worker with one environment in that worker and we’ll see what happens.

Hi @aadharna,

I would start commenting out code to see when the leak goes away. On a single worker machine you can also use tools like memray to help you identify the location of memory leak.

I currently have a run going where the MemoryTrackingCallbacks callback is active. Although this will take a long time, I hope it will show us what is taking all the memory. So far, I’m not pulling either of my two main files (the callback/runner file or my modified PPO file) and the memory tracing callback is only showing me internal ray and numpy objects.

If that’s still inconclusive, then I’ll run it again using the memray tool you linked me to.


The single rolloutworker / single environment run I ran just crashed a few minutes ago, but it got about 3/4ths of the way through a full experiment (albeit was very slow). Interestingly, there’s a difference between the System Memory Utilization graphs and the perf/ram_util_percent graph where the ram_util_percent graph is not showing that final spike.