Tune log spam from RLlib trials

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

I’m running some experiments with RLlib and Tune. I’m using verbose=2 for tune.run.
Besides the status prints there are loads of:

Trial trial-dc2f3_00000-env:BreakoutNoFrameskip-v4 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_kl_coeff': 0.000244140625, 'cur_lr
': 5.000000000000001e-05, 'total_loss': 0.06488624240216603, 'policy_loss': 0.015420469563258321, 'vf_loss': 0.06314889577854621, 'vf_explained_var': 0.006866280028694554, 'kl': 0.0020330968840546193, 'entropy':
 1.3683619737625121, 'entropy_coeff': 0.009999999999999998}, 'model': {}}}, 'num_env_steps_sampled': 119040, 'num_env_steps_trained': 119040, 'num_agent_steps_sampled': 119040, 'num_agent_steps_trained': 119040}
,sampler_results={'episode_reward_max': 5.0, 'episode_reward_min': 0.0, 'episode_reward_mean': 1.22, 'episode_len_mean': 717.91, 'episode_media': {}, 'episodes_this_iter': 69, 'policy_reward_min': {}, 'policy_re
ward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [0.0, 0.0, 4.0, 3.0, 0.0, 0.0, 0.0, 3.0, 1.0, 2.0, 2.0, 0.0, 3.0, 2.0, 0.0, 3.0, 1.0, 0.0, 1.0, 3.0, 2.0, 4.0, 0.0,
 0.0, 1.0, 3.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 3.0, 5.0, 1.0, 2.0, 2.0, 1.0, 2.0, 0.0, 3.0, 0.0, 2.0, 1.0, 0.0, 2.0, 2.0, 0.0, 3.0, 3.0, 1.0, 4.0, 1.0, 1.0, 
2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0], 'episode_lengths': [524, 521, 1195,
 1007, 516, 514, 524, 1021, 644, 807, 799, 526, 935, 819, 536, 938, 634, 524, 691, 972, 912, 1152, 542, 505, 710, 1012, 834, 632, 654, 819, 515, 517, 526, 693, 706, 515, 533, 632, 513, 702, 528, 514, 536, 967, 1
315, 714, 913, 888, 713, 812, 517, 984, 517, 765, 693, 544, 830, 903, 510, 874, 1027, 650, 1301, 703, 709, 785, 530, 526, 698, 503, 697, 1281, 840, 831, 509, 628, 645, 534, 521, 510, 517, 528, 1185, 517, 525, 63
0, 707, 714, 1115, 645, 697, 528, 532, 837, 1205, 528, 708, 911, 508, 518]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.9009499434309569, 'mean_inference_ms': 4.367436366802341, 'mean_action_processing_ms'
: 0.10969728436022855, 'mean_env_wait_ms': 5.875115251151804, 'mean_env_render_ms': 0.0}, 'off_policy_estimator': {}},episode_reward_max=5.0,episode_reward_min=0.0,episode_reward_mean=1.22,episode_len_mean=717.9
1,episodes_this_iter=69,policy_reward_min={},policy_reward_max={},policy_reward_mean={},sampler_perf={'mean_raw_obs_processing_ms': 0.9009499434309569, 'mean_inference_ms': 4.367436366802341, 'mean_action_proces
sing_ms': 0.10969728436022855, 'mean_env_wait_ms': 5.875115251151804, 'mean_env_render_ms': 0.0},off_policy_estimator={},num_healthy_workers=40,num_agent_steps_sampled=119040,num_agent_steps_trained=119040,num_e
nv_steps_sampled=119040,num_env_steps_trained=119040,num_env_steps_sampled_this_iter=9920,num_env_steps_trained_this_iter=9920,agent_timesteps_total=119040,timers={'training_iteration_time_ms': 4838.25, 'load_ti
me_ms': 98.121, 'load_throughput': 101100.032, 'learn_time_ms': 2499.231, 'learn_throughput': 3969.222, 'update_time_ms': 11.712},counters={'num_env_steps_sampled': 119040, 'num_env_steps_trained': 119040, 'num_
agent_steps_sampled': 119040, 'num_agent_steps_trained': 119040},perf={'cpu_util_percent': 12.3, 'ram_util_percent': 12.65, 'gpu_util_percent0': 0.97, 'vram_util_percent0': 0.3922932701795934, 'gpu_util_percent1
': 0.01, 'vram_util_percent1': 0.05012828103414249, 'gpu_util_percent2': 0.9450000000000001, 'vram_util_percent2': 0.9580619696072626, 'gpu_util_percent3': 0.315, 'vram_util_percent3': 0.3378478389579633, 'gpu_u
til_percent4': 0.0, 'vram_util_percent4': 0.0, 'gpu_util_percent5': 0.0, 'vram_util_percent5': 0.0, 'gpu_util_percent6': 0.0, 'vram_util_percent6': 0.0, 'gpu_util_percent7': 0.315, 'vram_util_percent7': 0.0} wit
h parameters={'lambda': 0.95, 'kl_coeff': 0.5, 'clip_rewards': True, 'clip_param': 0.1, 'vf_clip_param': 10.0, 'entropy_coeff': 0.01, 'train_batch_size': 5000, 'rollout_fragment_length': 20, 'sgd_minibatch_size'
: 500, 'num_sgd_iter': 10, 'model': {'dim': 42, 'vf_share_layers': True}, 'env': 'BreakoutNoFrameskip-v4', 'env_config': {}, 'num_workers': 40, 'evaluation_config': {'env_config': {'is_eval': True}}, 'evaluation
_interval': 0, 'evaluation_num_episodes': 10, 'num_cpus_per_worker': 0.5, 'evaluation_num_workers': 0, 'num_gpus': 0.1, 'framework': 'torch', 'num_envs_per_worker': 2, 'batch_mode': 'truncate_episodes', 'observa
tion_filter': 'NoFilter', 'placement_strategy': 'SPREAD', 'callbacks': <class 'spg_experiments.callbacks.CustomCallbacks'>}.
Trial trial-dc2f3_00003-env:PongNoFrameskip-v4 reported custom_metrics={},episode_media={},info={'learner': {'default_policy': {'custom_metrics': {}, 'learner_stats': {'cur_kl_coeff': 0.0009765625, 'cur_lr': 5.0
00000000000001e-05, 'total_loss': 0.08863984401288785, 'policy_loss': 0.007324397110527283, 'vf_loss': 0.09908627310865804, 'vf_explained_var': 0.009671232888573094, 'kl': 0.0017151221690470844, 'entropy': 1.777
2501738447892, 'entropy_coeff': 0.009999999999999998}, 'model': {}}}, 'num_env_steps_sampled': 99200, 'num_env_steps_trained': 99200, 'num_agent_steps_sampled': 99200, 'num_agent_steps_trained': 99200},sampler_r
esults={'episode_reward_max': -19.0, 'episode_reward_min': -21.0, 'episode_reward_mean': -20.5, 'episode_len_mean': 3499.625, 'episode_media': {}, 'episodes_this_iter': 2, 'policy_reward_min': {}, 'policy_reward
_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-20.0, -21.0, -20.0, -21.0, -20.0, -20.0, -20.0, -20.0, -21.0, -21.0, -19.0, -21.0, -21.0, -21.0, -21.0, -20.0, -21.0,
 -20.0, -21.0, -21.0, -20.0, -21.0, -21.0, -19.0, -21.0, -21.0, -21.0, -21.0, -21.0, -19.0, -21.0, -21.0, -20.0, -21.0, -21.0, -21.0, -19.0, -21.0, -19.0, -21.0], 'episode_lengths': [3368, 3169, 3353, 3449, 3608
, 3618, 3684, 3730, 3297, 3649, 3679, 3301, 3378, 3537, 3297, 3368, 3321, 3608, 3301, 3289, 3720, 3374, 3417, 3747, 3057, 3297, 3305, 3657, 3486, 3926, 3521, 3376, 3964, 3305, 3409, 3173, 4043, 3618, 3865, 3721]
}, 'sampler_perf': {'mean_raw_obs_processing_ms': 0.4185098683360982, 'mean_inference_ms': 4.410973660846652, 'mean_action_processing_ms': 0.11090935846879701, 'mean_env_wait_ms': 5.709484367935604, 'mean_env_re
nder_ms': 0.0}, 'off_policy_estimator': {}},episode_reward_max=-19.0,episode_reward_min=-21.0,episode_reward_mean=-20.5,episode_len_mean=3499.625,episodes_this_iter=2,policy_reward_min={},policy_reward_max={},po
licy_reward_mean={},sampler_perf={'mean_raw_obs_processing_ms': 0.4185098683360982, 'mean_inference_ms': 4.410973660846652, 'mean_action_processing_ms': 0.11090935846879701, 'mean_env_wait_ms': 5.709484367935604
, 'mean_env_render_ms': 0.0},off_policy_estimator={},num_healthy_workers=40,num_agent_steps_sampled=99200,num_agent_steps_trained=99200,num_env_steps_sampled=99200,num_env_steps_trained=99200,num_env_steps_sampl
ed_this_iter=9920,num_env_steps_trained_this_iter=9920,agent_timesteps_total=99200,timers={'training_iteration_time_ms': 5975.772, 'load_time_ms': 446.88, 'load_throughput': 22198.357, 'learn_time_ms': 3354.836,
 'learn_throughput': 2956.925, 'update_time_ms': 12.873},counters={'num_env_steps_sampled': 99200, 'num_env_steps_trained': 99200, 'num_agent_steps_sampled': 99200, 'num_agent_steps_trained': 99200},perf={'cpu_u
til_percent': 11.6, 'ram_util_percent': 12.5, 'gpu_util_percent0': 0.92, 'vram_util_percent0': 0.3922932701795934, 'gpu_util_percent1': 0.01, 'vram_util_percent1': 0.05012828103414249, 'gpu_util_percent2': 0.9, 
'vram_util_percent2': 0.9580619696072626, 'gpu_util_percent3': 0.97, 'vram_util_percent3': 0.3378478389579633, 'gpu_util_percent4': 0.0, 'vram_util_percent4': 0.0, 'gpu_util_percent5': 0.0, 'vram_util_percent5':
 0.0, 'gpu_util_percent6': 0.0, 'vram_util_percent6': 0.0, 'gpu_util_percent7': 0.33, 'vram_util_percent7': 0.0} with parameters={'lambda': 0.95, 'kl_coeff': 0.5, 'clip_rewards': True, 'clip_param': 0.1, 'vf_cli
p_param': 10.0, 'entropy_coeff': 0.01, 'train_batch_size': 5000, 'rollout_fragment_length': 20, 'sgd_minibatch_size': 500, 'num_sgd_iter': 10, 'model': {'dim': 42, 'vf_share_layers': True}, 'env': 'PongNoFramesk
ip-v4', 'env_config': {}, 'num_workers': 40, 'evaluation_config': {'env_config': {'is_eval': True}}, 'evaluation_interval': 0, 'evaluation_num_episodes': 10, 'num_cpus_per_worker': 0.5, 'evaluation_num_workers':
 0, 'num_gpus': 0.1, 'framework': 'torch', 'num_envs_per_worker': 2, 'batch_mode': 'truncate_episodes', 'observation_filter': 'NoFilter', 'placement_strategy': 'SPREAD', 'callbacks': <class 'spg_experiments.call
backs.CustomCallbacks'>}.

It clutters the logs and it doesn’t provide anything useful in this format.
Is this intended? If so, can it be turned off?

Thanks

Hey @vakker00, thanks for the questions. Would reducing the verbosity (verbose=1) solve this for you?

Thanks for the suggestion. That would not solve the problem, because it doesn’t include the status table that is quite useful, e.g.:

== Status ==                                                                                                                                                                                                       
Current time: 2022-05-09 09:47:05 (running for 00:00:31.55)                                                                                                                                                        
Memory usage on this node: 4.7/62.7 GiB                                                                                                                                                                            
Using FIFO scheduling algorithm.                                                                                                                                                                                   
Resources requested: 3.0/24 CPUs, 0/2 GPUs, 0.0/36.49 GiB heap, 0.0/18.25 GiB objects (0.0/1.0 accelerator_type:G)                                                                                                 
Result logdir: /home/user/ray_results/PPO                                                                                                                                                                         
Number of trials: 1/1 (1 RUNNING)                                                                                                                                                                                  
+-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+                                 
| Trial name                  | status   | loc                  |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |                                 
|-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_877e0_00000 | RUNNING  | 192.168.1.74:3635534 |      2 |          19.7981 | 8000 |    44.66 |                  142 |                    9 |              44.66 |                                 
+-----------------------------+----------+----------------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+     

@vakker00 gotcha, depending on how you’re running this script would a temporary workaround like grepping out the spammy logs work? For example:

python tune_example.py | grep -v "with parameters="

@kai do you have any thoughts on making this less spammy?

Yeah this looks like a bug. I can take a look this week to resolve this!