1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.55.1
- Python version: 3.14.5
- OS: macOS and linux
- Cloud/Infrastructure: -
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: Algorithm builds successfully
- Actual: OOM
Hello everybody. I am running league-play on a territorial-based game (territorial.io clone). Now I am scaling up my amount of players from 8 to 32 and map size from 80x80 to 256x256, but this disallows me to even build the algo for evaluation due to OOM: on my mac it fills 40 GB of RAM right away.
- My obs space: Dict(‘action_mask’: Box(0.0, 1.0, (44,), float32), ‘observations’: Box(-1, 1, (132, 256, 256), int8), ‘stats’: Box(0.0, 1.0, (64,), float32)) → (8650860,) (132 channels because of stacking 4 frames to the model and each frame is OHE for 32 players). stats are being fed to the FC part of my CNN.
- My model is a middle-sized CNN with ±5M params.
I see that memory starts to skyrocket instantly. - I am trying to set up 1 trainable policy and a pool of 32 bot algorithmic policies (no model at all) + 32 frozen copies of trainable one, which are being updated in league play. I sample those policies to form a new league in league play.
Here is my config:
-----common.py-----
from strategy_config import N_PLAYERS
from ray.rllib.algorithms.ppo import PPOConfig
from prepare_env import ENV_NAME
from players.model import MODEL_NAME
from callback import MergedCallback
from league_play_callback import policies
config = (
PPOConfig()
.environment(
env=ENV_NAME,
clip_actions=True,
disable_env_checking=False,
)
.multi_agent(
policies=policies,
policy_mapping_fn=(
lambda aid, *args, **kwargs: "p0"
), # will be changed in league callback
count_steps_by="agent_steps",
policies_to_train=["p0"],
policy_map_capacity=N_PLAYERS,
)
.debugging(log_level="DEBUG")
.framework(framework="torch")
.api_stack(
enable_rl_module_and_learner=False,
enable_env_runner_and_connector_v2=False,
)
.callbacks(MergedCallback)
# .checkpointing(checkpoint_trainable_policies_only=True)
)
training_params = dict(
train_batch_size=None,
minibatch_size=None,
lr=2e-5,
gamma=0.999,
lambda_=0.9,
use_gae=True,
clip_param=0.2,
grad_clip=0.5,
entropy_coeff=0.01,
vf_loss_coeff=0.25,
num_epochs=10,
model={"custom_model": MODEL_NAME},
)
env_runners_params = dict(batch_mode="truncate_episodes")
from configs.common import config, training_params, env_runners_params
----mac_config.py-----
CONFIG_NAME = "MAC"
training_params["train_batch_size"] = 64
training_params["minibatch_size"] = 16
config = config.training(**training_params).env_runners(
num_env_runners=2, **env_runners_params, num_cpus_per_env_runner=2
)
NUM_CPUS = 4
NUM_GPUS = 0
I understood that build_algo() setups workers that run a sample batch for initializing losses and stuff, but i don’t understand why does this stack up to enormous 40GB of memory, when 1 obs sample pass and policy weights doesn’t take more than 300MB in total. Why does this happen?
Any help is appreciated!