How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hello!
I would like to ask, I have the very simple following script to run PPO as independent policies on PettingZoo environments:
def env_creator_coop_pong(args):
env = cooperative_pong_v5.env()
env = ss.max_observation_v0(env, 2)
env = ss.sticky_actions_v0(env, repeat_action_probability=0.25)
env = ss.frame_skip_v0(env,4)
env = ss.resize_v0(env, 84, 84)
env = ss.frame_stack_v1(env, 4)
return PettingZooEnv(env)
ray.shutdown()
ray.init(num_cpus=10)
env = env_creator_coop_pong({})
register_env("env_creator_coop_pong", env_creator_coop_pong)
analysis = tune.run(
"PPO",
stop={"episodes_total": 1},
#stop = {"timesteps_total": 5000},
checkpoint_freq=10,
verbose=3,
config={
# Enviroment specific.
"env": "env_creator_coop_pong",
# General
"num_gpus": 0,
"num_workers": 4,
"num_envs_per_worker": 8,
# "learning_starts": 1000,
#"buffer_size": int(1e5),
"compress_observations": True,
"rollout_fragment_length": 20,
"train_batch_size": 512,
"gamma": 0.99,
# "n_step": 3,
"lr": 0.0001,
#"prioritized_replay_alpha": 0.5,
#"final_prioritized_replay_beta": 1.0,
#"target_network_update_freq": 50000,
"timesteps_per_iteration": 25000,
# Method specific
"multiagent": {
"policies": set(env.agents),
"policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
}
},
)
For simple benchmarking purposes and it runs for more than 15 minutes! I do not understand, does not my “stop: episodes_total” force the algorithm to quit after running a single episode? It cannot be that a single episode takes 15 minutes! I am then afraid I am misunderstanding how the stop condition works.
Then, I also ran the same experiment with a different MPE environment with “episodes_total = 10” but after inspecting my hist_stats and “episodes_this_iter” in the output trials it gives me 500 runs. What am I missing here?