Ray rllib tune.run() stuck in running

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello,

I am using a custom environment and agent and when I call tune.run it simply keeps printing this forever and never starts training:

== Status ==
Current time: 2023-05-23 18:59:41 (running for 00:06:08.00)
Using FIFO scheduling algorithm.
Logical resource usage: 3.0/8 CPUs, 0/0 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: /cluster/home/scheschb/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)

I tried to change the number of cpus or gpus but it never starts. The config I have is the following:

{
    'extra_python_environs_for_driver': {},
    'extra_python_environs_for_worker': {},
    'num_gpus': 0,
    'num_cpus_per_worker': 1,
    'num_gpus_per_worker': 0,
    '_fake_gpus': False,
    'num_learner_workers': 0,
    'num_gpus_per_learner_worker': 0,
    'num_cpus_per_learner_worker': 1,
    'local_gpu_idx': 0,
    'custom_resources_per_worker': {},
    'placement_strategy': 'PACK',
    'eager_tracing': False,
    'eager_max_retraces': 20,
    'tf_session_args': {
        'intra_op_parallelism_threads': 2,
        'inter_op_parallelism_threads': 2,
        'gpu_options': {
            'allow_growth': True
        },
        'log_device_placement': False,
        'device_count': {
            'CPU': 1
        },
        'allow_soft_placement': True
    },
    'local_tf_session_args': {
        'intra_op_parallelism_threads': 8,
        'inter_op_parallelism_threads': 8
    },
    'env': 'LabellingEnv-v0',
    'env_config': {
        'cubeedge': 0.05,
        'results_path': None,
        'debug': True,
        'dataset': 'scannet',
        'max_num_clicks': 20,
        'label': None,
        'pretraining_weights': '/cluster/scratch/scheschb/3d_inter_obj_seg/dataset/sample/weights/weights_exp14_14.pth',
        'dataset_scenes': '/cluster/scratch/scheschb/3d_inter_obj_seg/Minkowski/training/sample_data/scannet/dataset_scannet_val.npy',
        'dataset_classes': '/cluster/scratch/scheschb/3d_inter_obj_seg/Minkowski/training/sample_data/scannet/dataset_scannet_val_classes2.txt',
        'dataset_folder_scene': '/cluster/scratch/scheschb/3d_inter_obj_seg/dataset/sample/crops5x5/',
        'dataset_folder_masks': '/cluster/scratch/scheschb/3d_inter_obj_seg/dataset/sample/masks5x5/',
        'dummy_data': True,
        'model': 'PointwiseLinearsModel'
    },
    'observation_space': None,
    'action_space': None,
    'env_task_fn': None,
    'render_env': False,
    'clip_rewards': None,
    'normalize_actions': True,
    'clip_actions': False,
    'disable_env_checking': False,
    'is_atari': None,
    'auto_wrap_old_gym_envs': True,
    'num_envs_per_worker': 1,
    'sample_collector': < class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector' > ,
    'sample_async': False,
    'enable_connectors': True,
    'rollout_fragment_length': 'auto',
    'batch_mode': 'truncate_episodes',
    'remote_worker_envs': False,
    'remote_env_batch_wait_ms': 0,
    'validate_workers_after_construction': True,
    'preprocessor_pref': 'deepmind',
    'observation_filter': 'NoFilter',
    'synchronize_filters': True,
    'compress_observations': False,
    'enable_tf1_exec_eagerly': False,
    'sampler_perf_stats_ema_coef': None,
    'gamma': 0.99,
    'lr': 5e-05,
    'train_batch_size': 2,
    'model': {
        '_disable_preprocessor_api': False,
        '_disable_action_flattening': False,
        'fcnet_hiddens': [256, 256],
        'fcnet_activation': 'tanh',
        'conv_filters': None,
        'conv_activation': 'relu',
        'post_fcnet_hiddens': [],
        'post_fcnet_activation': 'relu',
        'free_log_std': False,
        'no_final_linear': False,
        'vf_share_layers': False,
        'use_lstm': False,
        'max_seq_len': 20,
        'lstm_cell_size': 256,
        'lstm_use_prev_action': False,
        'lstm_use_prev_reward': False,
        '_time_major': False,
        'use_attention': False,
        'attention_num_transformer_units': 1,
        'attention_dim': 64,
        'attention_num_heads': 1,
        'attention_head_dim': 32,
        'attention_memory_inference': 50,
        'attention_memory_training': 50,
        'attention_position_wise_mlp_dim': 32,
        'attention_init_gru_gate_bias': 2.0,
        'attention_use_n_prev_actions': 0,
        'attention_use_n_prev_rewards': 0,
        'framestack': True,
        'dim': 84,
        'grayscale': False,
        'zero_mean': True,
        'custom_model': 'PointwiseLinearsModel',
        'custom_model_config': {},
        'custom_action_dist': None,
        'custom_preprocessor': None,
        'encoder_latent_dim': None,
        'lstm_use_prev_action_reward': -1,
        '_use_default_native_models': -1
    },
    'optimizer': {},
    'max_requests_in_flight_per_sampler_worker': 2,
    'learner_class': None,
    '_enable_learner_api': False,
    '_learner_hps': PPOLearnerHPs(kl_coeff = 0.2, kl_target = 0.01, use_critic = True, clip_param = 0.3, vf_clip_param = 10.0, entropy_coeff = 0.0, vf_loss_coeff = 1.0, lr_schedule = None, entropy_coeff_schedule = None),
    'explore': True,
    'exploration_config': {
        'type': 'StochasticSampling'
    },
    'policy_states_are_swappable': False,
    'input_config': {},
    'actions_in_input_normalized': False,
    'postprocess_inputs': False,
    'shuffle_buffer_size': 0,
    'output': None,
    'output_config': {},
    'output_compress_columns': ['obs', 'new_obs'],
    'output_max_file_size': 67108864,
    'offline_sampling': False,
    'evaluation_interval': None,
    'evaluation_duration': 10,
    'evaluation_duration_unit': 'episodes',
    'evaluation_sample_timeout_s': 180.0,
    'evaluation_parallel_to_training': False,
    'evaluation_config': None,
    'off_policy_estimation_methods': {},
    'ope_split_batch_by_episode': True,
    'evaluation_num_workers': 0,
    'always_attach_evaluation_results': False,
    'enable_async_evaluation': False,
    'in_evaluation': False,
    'sync_filters_on_rollout_workers_timeout_s': 60.0,
    'keep_per_episode_custom_metrics': False,
    'metrics_episode_collection_timeout_s': 60.0,
    'metrics_num_episodes_for_smoothing': 100,
    'min_time_s_per_iteration': None,
    'min_train_timesteps_per_iteration': 0,
    'min_sample_timesteps_per_iteration': 0,
    'export_native_model_files': False,
    'checkpoint_trainable_policies_only': False,
    'logger_creator': None,
    'logger_config': None,
    'log_level': 'INFO',
    'log_sys_usage': True,
    'fake_sampler': False,
    'seed': None,
    'worker_cls': None,
    'ignore_worker_failures': False,
    'recreate_failed_workers': False,
    'max_num_worker_restarts': 1000,
    'delay_between_worker_restarts_s': 60.0,
    'restart_failed_sub_environments': False,
    'num_consecutive_worker_failures_tolerance': 100,
    'worker_health_probe_timeout_s': 60,
    'worker_restore_timeout_s': 1800,
    'rl_module_spec': None,
    '_enable_rl_module_api': False,
    '_validate_exploration_conf_and_rl_modules': True,
    '_tf_policy_handles_more_than_one_loss': False,
    '_disable_preprocessor_api': False,
    '_disable_action_flattening': False,
    '_disable_execution_plan_api': True,
    'simple_optimizer': -1,
    'replay_sequence_length': None,
    'horizon': -1,
    'soft_horizon': -1,
    'no_done_at_end': -1,
    'lr_schedule': None,
    'use_critic': True,
    'use_gae': True,
    'kl_coeff': 0.2,
    'sgd_minibatch_size': 2,
    'num_sgd_iter': 30,
    'shuffle_sequences': True,
    'vf_loss_coeff': 1.0,
    'entropy_coeff': 0.0,
    'entropy_coeff_schedule': None,
    'clip_param': 0.3,
    'vf_clip_param': 10.0,
    'grad_clip': None,
    'kl_target': 0.01,
    'vf_share_layers': -1,
    'lambda': 1.0,
    'input': 'sampler',
    'multiagent': {
        'policies': {
            'default_policy': (None, None, None, None)
        },
        'policy_mapping_fn': < function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x2b7737c131f0 > ,
        'policies_to_train': None,
        'policy_map_capacity': 100,
        'policy_map_cache': -1,
        'count_steps_by': 'env_steps',
        'observation_fn': None
    },
    'callbacks': < class 'ray.rllib.algorithms.callbacks.DefaultCallbacks' > ,
    'create_env_on_driver': False,
    'custom_eval_function': None,
    'framework': 'torch',
    'num_cpus_for_driver': 1,
    'num_workers': 2
}

Any ideas why this could be happening or how I could debug this?

Thank you!

I am just stupid, it’s training. The verbose was set to 1 in the code I copied, so it doesn’t print anything. If you have the sample problem, I recommend setting verbose to 3.

@Benedikt_Schesch Glad to hear that you solved your problem. Could you mark your answer as a solution and close out the thread? Thanks.