Limit number of steps?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Ray’s tune.run() is running endlessly. Can I force it to save a checkpoint and begin a new iteration after a number of steps?

analysis = tune.run(
    "PPO",
    stop={
        "episode_reward_mean": 2,
        "training_iteration": 35,
    },
    config={
        "env": "TradingEnv",
        "env_config": env_config_training,
        "log_level": "ERROR",
        #"log_level": "INFO",
        #"log_level": "DEBUG",
        "framework": "torch",
        "ignore_worker_failures": False,
        "clip_rewards": True,
        "lr": LR,
        "lr_schedule": [
            [0, 1e-1],
            [int(1e2), 1e-2],
            [int(1e3), 1e-3],
            [int(1e4), 1e-4],
            [int(1e5), 1e-5],
            [int(1e6), 1e-6],
            [int(1e7), 1e-7]
        ],
        "model": {
            "use_lstm": True,
            "lstm_cell_size": 512
        },
        "gamma": GAMMA,
        "observation_filter": "MeanStdFilter",
        "lambda": LAMBDA,
        "vf_share_layers": True,
        "vf_loss_coeff": VF_LOSS_COEFF,
        "entropy_coeff": ENTROPY_COEFF,
        "evaluation_interval": 1,  # Run evaluation on every iteration
        "evaluation_config": {
            "env_config": env_config_evaluation,  # The dictionary we built before (only the overriding keys to use in evaluation)
            "explore": False,  # We don't want to explore during evaluation. All actions have to be repeatable.
        },
    },
    metric=checkpoint_metric,
    mode="max",
    search_alg=search_alg,
    scheduler=scheduler,
    num_samples=10,  # Samples per hyperparameter combination. More averages out randomness. Less runs faster
    keep_checkpoints_num=10,  # Keep the last 10 checkpoints
    checkpoint_freq=1,  # Do a checkpoint on each iteration (slower but you can pick more finely the checkpoint to use later)
#    resume="AUTO",
    local_dir="./results",
    name=f"testing_{int(time.time()-1651400000)}",
    trial_name_creator=Methods.trial_name_string
)

Does timesteps_total under stop work for you?

I will try it. I’m surprised the option is not listed in documentation.
https://docs.ray.io/en/latest/tune/api_docs/stoppers.html
https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html

It does not, for some reason tune.run() ignores it and continues training on multiple timesteps even when "timesteps_total": 1 is under stop.

analysis = tune.run("PPO",
    stop={
        "timesteps_total": 1,
    }, ...) # does not work

@arturn When you get a chance, could you hep @Christian_Coletti with this question?

1 Like

Hey @Christian_Coletti, hey @xwjiang2010 ,

The option is not listed under the tune api docs, because it’s specific to RLLib. So timesteps_total is not a generic metric for everything tunable by tune. Every iteration, you can have a look at the output of the analysis and will find something like this …

...
timestamp: 1651834198
timesteps_since_restore: 0
timesteps_total: 12000
training_iteration: 3
trial_id: e577a_00000
...

… amongst many other metrics and infos. These are the ones that you can chose for the given algorithm. They vary per algorithm, but training_iteration or timesteps_total are obviously ubiquitous.

Regarding your experiment:
To reproduce I had to cut out a couple of config parameters and have used the following:

from ray import tune

analysis = tune.run(
    "PPO",
    stop={
        "timesteps_total": 10,
        "episode_reward_mean": 45,
        "training_iteration": 20,
    },
    config={
        "env": "CartPole-v0",
        "log_level": "ERROR",
        "framework": "torch",
        "ignore_worker_failures": False,
        "clip_rewards": True,
        "lr_schedule": [
            [0, 1e-1],
            [int(1e2), 1e-2],
            [int(1e3), 1e-3],
            [int(1e4), 1e-4],
            [int(1e5), 1e-5],
            [int(1e6), 1e-6],
            [int(1e7), 1e-7]
        ],
        "model": {
            "use_lstm": True,
            "lstm_cell_size": 512
        },
        "observation_filter": "MeanStdFilter",
        "evaluation_interval": 1,
    },
    mode="max",
    keep_checkpoints_num=10,
    checkpoint_freq=1,
    local_dir="~/ray_results/",
    name="test",
    resume=False,
)

This works on my side. The experiment stops after the first iteration, because the timesteps are exceeded. Similar logic works for episode_reward_mean or training_iteration.
Can you confirm or provide a script that does not provide the desired behaviour out of the box? Ideally with the version of ray your are using :slightly_smiling_face:

Cheers

1 Like

Hello,

My tune.run() is running endlessly and not beginning new iterations. I’m looking for a way to shorten an iteration and force it to save a checkpoint.

Since you said

I believe this does not answer my question

Thanks

No problem! Is my script running endlessly on your side? Can you confirm or provide a script that does not provide the desired behaviour out of the box? Ideally with the version of ray your are using :slightly_smiling_face:
Your script is not executable as is and removing the unknowns to run somethings similar does not yield your undesired outcome.

Is there maybe a misunderstanding? I understand that tune never ends the first training iteration and is therefore never able to checkpoint - since there is not result. Correct?
To shorten iterations you will have to go through RLlib which has traditionally used the "timesteps_per_iteration" and has now switched to "min_sample_timesteps_per_reporting" and "min_train_timesteps_per_reporting" incase you are working with a nightly build.

Best