My tuner cannot stop as expected

jiangzhangze · March 23, 2023, 12:04pm

I’m trying to train a PPO with a custom env followed this tutorial.But my tuner didn’t stop until I stop it by hand.And some informations such as reward and episode don’t show in log.
Here are the train.py：

import ray
from ray.rllib.algorithms import ppo
from envs.my_env import PathPlanning
from ray.tune import register_env
def env_creator(env_config):
    return PathPlanning(env_config)
register_env("PathPlanning", env_creator)
ray.init()
ppo_config = ppo.PPOConfig()
ppo_config.environment(env="CartPole-v0")
ppo_config.framework(framework="tf2")
ppo_config.debugging(seed=415, log_level="ERROR")
ppo_config.evaluation(
    evaluation_interval=15,
    evaluation_duration=5,
    evaluation_num_workers=4,
    evaluation_parallel_to_training=True,
    evaluation_config=dict(
        explore=False,
        num_workers=2,
    ),
)
ppo_config.rollouts(num_rollout_workers=4,
                    num_envs_per_worker=1)
ppo_algo = ppo_config.build()

ppo_config.training(lr=ray.tune.grid_search([5e-5, 2e-5]),
                    train_batch_size=ray.tune.grid_search([128, 256]))

stop = dict(
    timesteps_total=100,
    trainning_iteration=5
)
tuner = ray.tune.Tuner(
    ppo_config.algo_class,

    param_space=ppo_config.to_dict(),

    run_config=ray.air.RunConfig(
        local_dir="my_Tune_logs",
        stop=stop,
        verbose=3,
    )
)

experiment_results = tuner.fit()

And here are the log：

== Status ==
Current time: 2023-03-23 20:02:11 (running for 00:00:05.18)
Memory usage on this node: 13.9/15.8 GiB 
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/1 GPUs, 0.0/3.06 GiB heap, 0.0/1.53 GiB objects
Result logdir: E:\AI\rllib\my_env\PathPlanning\my_Tune_logs\PPO_2023-03-23_20-02-06
Number of trials: 4/4 (4 PENDING)
+-----------------------------+----------+-------+-------+--------------------+
| Trial name                  | status   | loc   |    lr |   train_batch_size |
|-----------------------------+----------+-------+-------+--------------------|
| PPO_CartPole-v0_8843a_00000 | PENDING  |       | 5e-05 |                128 |
| PPO_CartPole-v0_8843a_00001 | PENDING  |       | 2e-05 |                128 |
| PPO_CartPole-v0_8843a_00002 | PENDING  |       | 5e-05 |                256 |
| PPO_CartPole-v0_8843a_00003 | PENDING  |       | 2e-05 |                256 |
+-----------------------------+----------+-------+-------+--------------------+

In addition，there are JASON files and PKL files in the “logs” folder.But tensorboard showed nothing.

mannyv · March 23, 2023, 1:44pm

Hi @jiangzhangze,

The statud=pending means that none of them started training. Usually this means you requested more resources than you have available.

jiangzhangze · March 24, 2023, 5:03am

Hi@mannyv,thanks for your reply.According to tutorial, total number of ray actor= num_workers + num_rollout_workers + evaluation_num_workers + 1.But the code above required 4 + 2 + 4 + 1 = 11<12(my cpus),why did this happen?

Topic		Replies	Views
Stopping condition in Tune confusion RLlib	1	524	March 24, 2022
Tune.run() doesn't work. runs endlessly Ray Tune stopping condition & comparisons	1	539	November 2, 2023
Stopping criteria for PPOTrainer RLlib	2	837	January 30, 2022
PPO only run several steps in one episode RLlib	1	50	September 10, 2024
Policy rollout on Ray Tune 2.0 RLlib	4	315	December 15, 2022

My tuner cannot stop as expected

Related topics