Tune log spam from RLlib trials

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

I’m running some experiments with RLlib and Tune. I’m using verbose=2 for tune.run.
Besides the status prints there are loads of:

Trial trial-dc2f3_00000-env:BreakoutNoFrameskip-v4 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_kl_coeff': 0.000244140625, 'cur_lr
': 5.000000000000001e-05, 'total_loss': 0.06488624240216603, 'policy_loss': 0.015420469563258321, 'vf_loss': 0.06314889577854621, 'vf_explained_var': 0.006866280028694554, 'kl': 0.0020330968840546193, 'entropy':
 1.3683619737625121, 'entropy_coeff': 0.009999999999999998}, 'model': {}}}, 'num_env_steps_sampled': 119040, 'num_env_steps_trained': 119040, 'num_agent_steps_sampled': 119040, 'num_agent_steps_trained': 119040}
,sampler_results={'episode_reward_max': 5.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.22, 'episode_len_mean': 717.91, 'episode_media': {}, 'episodes_this_iter': 69, 'policy_reward_min': {}, 'policy_re
ward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 4.0, 3.0, 0.0, 0.0, 0.0, 3.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 3.0, 1.0, 0.0, 1.0, 3.0, 2.0, 4.0, 0.0,
 0.0, 1.0, 3.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 3.0, 5.0, 1.0, 2.0, 2.0, 1.0, 2.0, 0.0, 3.0, 0.0, 2.0, 1.0, 0.0, 2.0, 2.0, 0.0, 3.0, 3.0, 1.0, 4.0, 1.0, 1.0, 
2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0], 'episode_lengths': [524, 521, 1195,
 1007, 516, 514, 524, 1021, 644, 807, 799, 526, 935, 819, 536, 938, 634, 524, 691, 972, 912, 1152, 542, 505, 710, 1012, 834, 632, 654, 819, 515, 517, 526, 693, 706, 515, 533, 632, 513, 702, 528, 514, 536, 967, 1
315, 714, 913, 888, 713, 812, 517, 984, 517, 765, 693, 544, 830, 903, 510, 874, 1027, 650, 1301, 703, 709, 785, 530, 526, 698, 503, 697, 1281, 840, 831, 509, 628, 645, 534, 521, 510, 517, 528, 1185, 517, 525, 63
0, 707, 714, 1115, 645, 697, 528, 532, 837, 1205, 528, 708, 911, 508, 518]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.9009499434309569, 'mean_inference_ms': 4.367436366802341, 'mean_action_processing_ms'
: 0.10969728436022855, 'mean_env_wait_ms': 5.875115251151804, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}},episode_reward_max=5.0,episode_reward_min=0.0,episode_reward_mean=1.22,episode_len_mean=717.9
1,episodes_this_iter=69,policy_reward_min={},policy_reward_max={},policy_reward_mean={},sampler_perf={'mean_raw_obs_processing_ms': 0.9009499434309569, 'mean_inference_ms': 4.367436366802341, 'mean_action_proces
sing_ms': 0.10969728436022855, 'mean_env_wait_ms': 5.875115251151804, 'mean_env_render_ms': 0.0},off_policy_estimator={},num_healthy_workers=40,num_agent_steps_sampled=119040,num_agent_steps_trained=119040,num_e
nv_steps_sampled=119040,num_env_steps_trained=119040,num_env_steps_sampled_this_iter=9920,num_env_steps_trained_this_iter=9920,agent_timesteps_total=119040,timers={'training_iteration_time_ms': 4838.25, 'load_ti
me_ms': 98.121, 'load_throughput': 101100.032, 'learn_time_ms': 2499.231, 'learn_throughput': 3969.222, 'update_time_ms': 11.712},counters={'num_env_steps_sampled': 119040, 'num_env_steps_trained': 119040, 'num_
agent_steps_sampled': 119040, 'num_agent_steps_trained': 119040},perf={'cpu_util_percent': 12.3, 'ram_util_percent': 12.65, 'gpu_util_percent0': 0.97, 'vram_util_percent0': 0.3922932701795934, 'gpu_util_percent1
': 0.01, 'vram_util_percent1': 0.05012828103414249, 'gpu_util_percent2': 0.9450000000000001, 'vram_util_percent2': 0.9580619696072626, 'gpu_util_percent3': 0.315, 'vram_util_percent3': 0.3378478389579633, 'gpu_u
til_percent4': 0.0, 'vram_util_percent4': 0.0, 'gpu_util_percent5': 0.0, 'vram_util_percent5': 0.0, 'gpu_util_percent6': 0.0, 'vram_util_percent6': 0.0, 'gpu_util_percent7': 0.315, 'vram_util_percent7': 0.0} wit
h parameters={'lambda': 0.95, 'kl_coeff': 0.5, 'clip_rewards': True, 'clip_param': 0.1, 'vf_clip_param': 10.0, 'entropy_coeff': 0.01, 'train_batch_size': 5000, 'rollout_fragment_length': 20, 'sgd_minibatch_size'
: 500, 'num_sgd_iter': 10, 'model': {'dim': 42, 'vf_share_layers': True}, 'env': 'BreakoutNoFrameskip-v4', 'env_config': {}, 'num_workers': 40, 'evaluation_config': {'env_config': {'is_eval': True}}, 'evaluation
_interval': 0, 'evaluation_num_episodes': 10, 'num_cpus_per_worker': 0.5, 'evaluation_num_workers': 0, 'num_gpus': 0.1, 'framework': 'torch', 'num_envs_per_worker': 2, 'batch_mode': 'truncate_episodes', 'observa
tion_filter': 'NoFilter', 'placement_strategy': 'SPREAD', 'callbacks': <class 'spg_experiments.callbacks.CustomCallbacks'>}.
Trial trial-dc2f3_00003-env:PongNoFrameskip-v4 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_kl_coeff': 0.0009765625, 'cur_lr': 5.0
00000000000001e-05, 'total_loss': 0.08863984401288785, 'policy_loss': 0.007324397110527283, 'vf_loss': 0.09908627310865804, 'vf_explained_var': 0.009671232888573094, 'kl': 0.0017151221690470844, 'entropy': 1.777
2501738447892, 'entropy_coeff': 0.009999999999999998}, 'model': {}}}, 'num_env_steps_sampled': 99200, 'num_env_steps_trained': 99200, 'num_agent_steps_sampled': 99200, 'num_agent_steps_trained': 99200},sampler_r
esults={'episode_reward_max': -19.0, 'episode_reward_min': -21.0, 'episode_reward_mean': -20.5, 'episode_len_mean': 3499.625, 'episode_media': {}, 'episodes_this_iter': 2, 'policy_reward_min': {}, 'policy_reward
_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-20.0, -21.0, -20.0, -21.0, -20.0, -20.0, -20.0, -20.0, -21.0, -21.0, -19.0, -21.0, -21.0, -21.0, -21.0, -20.0, -21.0,
 -20.0, -21.0, -21.0, -20.0, -21.0, -21.0, -19.0, -21.0, -21.0, -21.0, -21.0, -21.0, -19.0, -21.0, -21.0, -20.0, -21.0, -21.0, -21.0, -19.0, -21.0, -19.0, -21.0], 'episode_lengths': [3368, 3169, 3353, 3449, 3608
, 3618, 3684, 3730, 3297, 3649, 3679, 3301, 3378, 3537, 3297, 3368, 3321, 3608, 3301, 3289, 3720, 3374, 3417, 3747, 3057, 3297, 3305, 3657, 3486, 3926, 3521, 3376, 3964, 3305, 3409, 3173, 4043, 3618, 3865, 3721]
}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.4185098683360982, 'mean_inference_ms': 4.410973660846652, 'mean_action_processing_ms': 0.11090935846879701, 'mean_env_wait_ms': 5.709484367935604, 'mean_env_re
nder_ms': 0.0}, 'off_policy_estimator': {}},episode_reward_max=-19.0,episode_reward_min=-21.0,episode_reward_mean=-20.5,episode_len_mean=3499.625,episodes_this_iter=2,policy_reward_min={},policy_reward_max={},po
licy_reward_mean={},sampler_perf={'mean_raw_obs_processing_ms': 0.4185098683360982, 'mean_inference_ms': 4.410973660846652, 'mean_action_processing_ms': 0.11090935846879701, 'mean_env_wait_ms': 5.709484367935604
, 'mean_env_render_ms': 0.0},off_policy_estimator={},num_healthy_workers=40,num_agent_steps_sampled=99200,num_agent_steps_trained=99200,num_env_steps_sampled=99200,num_env_steps_trained=99200,num_env_steps_sampl
ed_this_iter=9920,num_env_steps_trained_this_iter=9920,agent_timesteps_total=99200,timers={'training_iteration_time_ms': 5975.772, 'load_time_ms': 446.88, 'load_throughput': 22198.357, 'learn_time_ms': 3354.836,
 'learn_throughput': 2956.925, 'update_time_ms': 12.873},counters={'num_env_steps_sampled': 99200, 'num_env_steps_trained': 99200, 'num_agent_steps_sampled': 99200, 'num_agent_steps_trained': 99200},perf={'cpu_u
til_percent': 11.6, 'ram_util_percent': 12.5, 'gpu_util_percent0': 0.92, 'vram_util_percent0': 0.3922932701795934, 'gpu_util_percent1': 0.01, 'vram_util_percent1': 0.05012828103414249, 'gpu_util_percent2': 0.9, 
'vram_util_percent2': 0.9580619696072626, 'gpu_util_percent3': 0.97, 'vram_util_percent3': 0.3378478389579633, 'gpu_util_percent4': 0.0, 'vram_util_percent4': 0.0, 'gpu_util_percent5': 0.0, 'vram_util_percent5':
 0.0, 'gpu_util_percent6': 0.0, 'vram_util_percent6': 0.0, 'gpu_util_percent7': 0.33, 'vram_util_percent7': 0.0} with parameters={'lambda': 0.95, 'kl_coeff': 0.5, 'clip_rewards': True, 'clip_param': 0.1, 'vf_cli
p_param': 10.0, 'entropy_coeff': 0.01, 'train_batch_size': 5000, 'rollout_fragment_length': 20, 'sgd_minibatch_size': 500, 'num_sgd_iter': 10, 'model': {'dim': 42, 'vf_share_layers': True}, 'env': 'PongNoFramesk
ip-v4', 'env_config': {}, 'num_workers': 40, 'evaluation_config': {'env_config': {'is_eval': True}}, 'evaluation_interval': 0, 'evaluation_num_episodes': 10, 'num_cpus_per_worker': 0.5, 'evaluation_num_workers':
 0, 'num_gpus': 0.1, 'framework': 'torch', 'num_envs_per_worker': 2, 'batch_mode': 'truncate_episodes', 'observation_filter': 'NoFilter', 'placement_strategy': 'SPREAD', 'callbacks': <class 'spg_experiments.call
backs.CustomCallbacks'>}.

It clutters the logs and it doesn’t provide anything useful in this format.
Is this intended? If so, can it be turned off?

Thanks

Hey @vakker00, thanks for the questions. Would reducing the verbosity (verbose=1) solve this for you?

Thanks for the suggestion. That would not solve the problem, because it doesn’t include the status table that is quite useful, e.g.:

== Status ==                                                                                                                                                                                                       
Current time: 2022-05-09 09:47:05 (running for 00:00:31.55)                                                                                                                                                        
Memory usage on this node: 4.7/62.7 GiB                                                                                                                                                                            
Using FIFO scheduling algorithm.                                                                                                                                                                                   
Resources requested: 3.0/24 CPUs, 0/2 GPUs, 0.0/36.49 GiB heap, 0.0/18.25 GiB objects (0.0/1.0 accelerator_type:G)                                                                                                 
Result logdir: /home/user/ray_results/PPO                                                                                                                                                                         
Number of trials: 1/1 (1 RUNNING)                                                                                                                                                                                  
+-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+                                 
| Trial name                  | status   | loc                  |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |                                 
|-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_877e0_00000 | RUNNING  | 192.168.1.74:3635534 |      2 |          19.7981 | 8000 |    44.66 |                  142 |                    9 |              44.66 |                                 
+-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+     

@vakker00 gotcha, depending on how you’re running this script would a temporary workaround like grepping out the spammy logs work? For example:

python tune_example.py | grep -v "with parameters="

@kai do you have any thoughts on making this less spammy?

Yeah this looks like a bug. I can take a look this week to resolve this!

Thanks for looking into it. Do you have any update on this?

Sorry for the delay. I have a branch open for this, but didn’t get to pursue this further. I’ve pushed this up on my backlog again so will continue working on this.

Do you have any update on this? Thanks!

Yes, sory for the delay - here is a RFC PR: [rfc] [tune/rllib] Fetch _progress_metrics from trainable for verbose=2 display by krfricke · Pull Request #26967 · ray-project/ray · GitHub