How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello, I am using the following code that works ok in my laptop with both both linux and windows operating systems. But when I am trying to run it on a cluster of 64 CPUs, it is extremely slow compared to my laptop and it produces only nan as episode reward, while episodes_this_iter=0 for every iteration. In my laptop I have been using num_cpus_for_local_worker=10 and num_rollout_workers=2. Also I get the warning:
The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (140 CPUs/pending trials). If you’re running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the
TUNE_MAX_PENDING_TRIALS_PG
environment variable to the desired maximum number of concurrent trials.
while the CPUs I use are 64 (so is ray seeing 140?). I tried setting TUNE_MAX_PENDING_TRIALS_PG
to 1 and 30, but there was no difference.
I use ray 3.0.0 dev0, because it is the only version that currently supports waterworld.v4.
from ray import air, tune
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v4
if __name__ == "__main__":
# RDQN - Rainbow DQN
# ADQN - Apex DQN
for i in range(9,11):
def env_creator(args):
return PettingZooEnv(waterworld_v4.env(n_pursuers=5, n_evaders=5))
env = env_creator({})
register_env("waterworld", env_creator)
obs_space = env.observation_space
act_spc = env.action_space
policies = {"shared_1": (None, obs_space, act_spc, {})
# "shared_2": (None, obs_space, act_spc, {})
# "pursuer_5": (None, obs_space, act_spc, {})
}
config = (
PPOConfig()
.environment("waterworld")
.resources(num_gpus=0, num_cpus_for_local_worker=32)
.rollouts(num_rollout_workers=30) # default = 2 (I should try it)
.framework("torch")
.multi_agent(
policies=policies,
policy_mapping_fn=(lambda agent_id, *args, **kwargs: "shared_1"),
)
)
tune.Tuner(
"PPO",
run_config=air.RunConfig(
name="waterworld_v4 n5 shared trial {0}w".format(i),
stop={"training_iteration": 1500},
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=10,
),
),
param_space=config.to_dict(),
).fit()
Please let me know if I should use any other different resources setting, or there are other slutions I could try.
Thanks, George