My training is endless wth


I have a question related to For sanity check l run it with one1 epoch and 2 training samples and 2 validations samples before scaling it to the whole dataset. However it seems that the training is endless. Any cue ?
Here is my code

    analysis =
        resources_per_trial=resources_per_trial,  # 16 cpus and 1. gpu

In my screen l have the following:
== Status ==

Current time: 2022-04-06 08:34:53 (running for 00:02:30.33)
Memory usage on this node: 16.8/58.9 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 1.000: None
Resources requested: 14.0/16 CPUs, 1.0/1 GPUs, 0.0/26.36 GiB heap, 0.0/13.18 GiB objects
Result logdir: /home/tune_model
Number of trials: 1/1 (1 RUNNING)

| Trial name | status | loc | kernel| lr |
| run_training_f38ce_00000 | RUNNING | | 16 | 0.32865 |

with running status since several hours where l suppose that the learning should terminate in less than one minute.

    train_fn_with_parameters = ray.tune.function_runner.with_parameters(
 tune_scheduler = tune.schedulers.ASHAScheduler(

Thanks for your help.

Hey @AhmedM do you mind also sharing the train_fn_with_parameters function?

As soon as that function finishes, the trial should terminate, so the endless running behavior is a bit odd.

Hi @amogkam thank you for your answer I added in the description the train_fn_with_parameters function.

In fixed_params=params, l set params["epochs"]=1

Oh @AhmedM can you also share your build_model function, and ideally as much of your code as possible? I mainly want to see how your training logic is defined.

@amogkam amogkam - this looks similar to a problem I have. I’ve tried with python==3.8.12 and ray==1.12 and also ray==1.11 (I had heard ray 1.12 has problems). Running in Jupyter Lab 3.3.3. I’ve tried with breakout and pong Atari environments. I’m running on which uses ubuntu 20.04 LTS, and is on Docker (I have very little understanding of how Docker works and whether this could cause problems). Any help or ideas you have would be really really helpful!

This is my code:

import ray
from ray import tune
import ray.rllib.agents.dqn as dqn
from ray.tune.logger import pretty_print
import gc

config = dqn.DEFAULT_CONFIG.copy()
config['env'] = 'PongDeterministic-v4'
config['framework'] = 'torch'
config["dueling"] = False
config["double_q"] = tune.grid_search([True, False])
config['num_atoms'] = 1
config['noisy'] = False
config['prioritized_replay'] = False
config['n_step'] = 1
config['target_network_update_freq'] = 8000
config['lr'] = 0.000625
config['adam_epsilon'] = 0.00015
config['hiddens'] = [512]
config['learning_starts'] = 20000
config['replay_buffer_config']['capacity'] = 1000000 # config['buffer_size'] has been deprecated
config['rollout_fragment_length'] = 4
config['train_batch_size'] = 32
config['exploration_config'] = {'type': 'EpsilonGreedy',
                                'initial_epsilon': 1.0,
                                'final_epsilon': 0.01,
                                'epsilon_timesteps': 200000}
config['prioritized_replay_alpha'] = 0.5
config['num_gpus'] = 0.2
config['num_workers'] = 6 # this depends on number of CPUs available
config['timesteps_per_iteration'] = 10000

def evaluation_fn(result):
    # for tuning
    return result['episode_reward_mean']

def objective_fn(config):
    trainer = dqn.DQNTrainer(config=config)

    for i in range(1):
        # Perform one iteration of training the policy with DQN
        result = trainer.train()
        intermediate_score = evaluation_fn(result)
        # Feed the score back back to Tune., mean_reward=intermediate_score)
        if i % 10 == 0 :
            checkpoint =
            print("checkpoint saved at", checkpoint)
            print("cpu utilisation: {:.1%}".format(result['perf']['cpu_util_percent']/100))
            print("ram utilisation: {:.1%}".format(result['perf']['ram_util_percent']/100))

analysis =,

When I tune with minimal search size and small number of iterations, it looks like is working fine, with regular Status updates printed. However, after a while it just hangs and doesn’t seem to do anything. This is an example of the last message I see:

I’m only allowed to embed one image per post: sorry!

I do have a few warning messages when first running the cell, as follows:

(I tried importing gputil, but that actually crashed the notebook for some reason)

Final warning I get after the very first status update is below. Then everything appears to run smoothly except it just stops updating.

Hi @alexxcollins
Do you mind opening a new thread for this issue? Hanging could be of different reasons - it’s better that we keep the discussions separate.

I am looking at your script. I am curious about any specific reason you write a custom fn rather than directly using"DQN", ...)? Any supported RLlib algorithm is pre-registered and ready to be used in tune runs like this. It may be more straightforward and less error prone.

I’m also having this problem with"PPO" ... )