How to train until convergence

Here it is an example of the configuration I am using for running some models:

    experiment_params = {
            "training": {
                "env": "example",
                "run": args.algorithm,
                "stop": {"training_iteration": 2000
                         #"timesteps_total": 8000000,
                "local_dir": "/opt/ml/output/intermediate",
                "checkpoint_at_end": True,
                "checkpoint_freq": 10,
                "config": {
                    "num_workers": int(os.cpu_count())-1,                    
                    #"sigma0": 0.01,
                    "lr": 0.001,
                    # "sample_batch_size": 1,
                    "num_gpus":  num_gpus,
                    "gamma": float(args.gamma),
                    # RAY 0.5 syntax: gpu_fraction
                    #"prioritized_replay": False,


I am using RLLIB 0.8.5, is there any way to change the stopping criteria such that it stops training once episode reward mean has converged?. I wouldn’t mind updating RLLIB

Hi! Definition of convergence depends on the problem and algorithm. Tune can only stop on metrics that are reported to it. So you’ll have to implement a metric that digests a record of past mean rewards to provide a measurement of convergence for tune to observe.
Here’s what to do:

  1. Write a custom Callback that tracks your metric of interest over time
    1.1 Write metric (maybe a simple average of absolute change in episode reward mean) into
  2. Tell tune to monitor that metric as a stopping metric

See here for example.