1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
[O] Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.42
- Python version: 3.12
- OS: Windows
3. What happened vs. what you expected:
- Expected: linear scaling
- Actual: non-linear scaling
I am testing if ray training scales w.r.t. the number of env_runners in rllib. Ultimately, I need to load a heavier environment, but I am doing a few tests on a lighter environment such as CartPole-v1. Here is my setup:
num_env_runners = int(input("Enter number of env runners: "))
# step_size = int(input("Enter number of step_size: "))
train_batch_size = int(input("Enter number of train_batch_size: "))
num_iterations = int(input("Enter number of num_iterations: "))
storage_path = SOME_PATH
ppo_config = (
PPOConfig()
.environment(
env="CartPole-v1",
)
.training(
model={
"fcnet_hiddens": [256, 256]
},
train_batch_size_per_learner=train_batch_size
)
.env_runners(
num_env_runners=num_env_runners,
batch_mode="truncate_episodes",
)
.learners(
)
.reporting(
# min_sample_timesteps_per_iteration=step_size
)
.multi_agent(
count_steps_by='agent_steps'
)
# TODO(@chungs4): Make experiment reproducible
.debugging(
seed=5
)
)
config_to_dict = ppo_config.to_dict()
tuner = ray.tune.Tuner(
"PPO",
param_space=config_to_dict,
run_config=ray.tune.RunConfig(
storage_path=storage_path,
# checkpoint_config=ray.tune.CheckpointConfig(checkpoint_frequency=3),
stop={
'training_iteration': num_iterations
},
verbose=1
)
)
result = tuner.fit()
I experimented various num_env_runners
, train_batch_size
but training time never seems to scale up linearly to num_env_runners
. This is an experiment result with train_batch_size=10000 (I tried as low as 128, which is the smallest size that train_batch_size can be):
num_env_runners=1 => total_time=340s
num_env_runners=2 => total_time=290s
num_env_runners=3 => total_time=227s
num_env_runners=4 => total_time=253s
Since a heavier environment will be loaded in the future, I am considering loading 1 env per runner as of now. Any idea which parameters I should consider changing? Thank you in advance.