@arturn,
I am seeing the issue he mentioned with cartpole server client example.
Running with following CLI args: Namespace(as_test=False, callbacks_verbose=False, framework='torch', local_mode=False, no_restore=False, no_tune=False, num_cpus=3, num_workers=0, port=9900, run='PPO', stop_iters=200, stop_reward=99999999.0, stop_timesteps=500000, use_lstm=False)
Usage stats collection is disabled.
Started a local Ray instance. View the dashboard at http://127.0.0.1:8265.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:46,938 INFO rollout_worker.py:838 -- Completed sample batch:
(PPO pid=1329306) 2022-09-22 09:40:49,371 INFO policy_server_input.py:284 -- Sending worker creation args to client.
(PPO pid=1329306) 2022-09-22 09:40:49,408 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:50,230 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:51,006 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:51,777 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:52,559 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:52,560 WARNING deprecation.py:47 -- DeprecationWarning: `concat_samples` has been deprecated. Use `concat_samples() from rllib.policy.sample_batch` instead. This will raise an error in the future!
(PPO pid=1329306) 2022-09-22 09:40:53,365 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Number of trials: 1/1 (1 RUNNING)
(PPO pid=1329306) 2022-09-22 09:40:54,170 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:54,961 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:55,748 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:56,525 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 50.0, 'episode_reward_min': 9.0, 'episode_reward_mean': 29.1, 'episode_len_mean': 29.1, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 7 taking up 1550090 bytes
(PPO pid=1329306) 2022-09-22 09:40:57,363 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 6 taking up 1520294 bytes
(PPO pid=1329306) Size of samples queue is: 6 taking up 1531410 bytes
(PPO pid=1329306) Size of samples queue is: 5 taking up 1240460 bytes
(PPO pid=1329306) Size of samples queue is: 4 taking up 931014 bytes
(PPO pid=1329306) 2022-09-22 09:40:58,196 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:58,984 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:59,412 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:40:59,796 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:00,596 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:01,356 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:01,759 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:02,113 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 11 taking up 2475412 bytes
(PPO pid=1329306) Size of samples queue is: 10 taking up 2447516 bytes
(PPO pid=1329306) Size of samples queue is: 9 taking up 2165654 bytes
(PPO pid=1329306) Size of samples queue is: 8 taking up 1855696 bytes
(PPO pid=1329306) Size of samples queue is: 7 taking up 1545866 bytes
(PPO pid=1329306) 2022-09-22 09:41:02,885 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:03,672 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:04,476 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:05,267 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,099 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,893 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,894 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 141.0, 'episode_reward_min': 10.0, 'episode_reward_mean': 40.1, 'episode_len_mean': 40.1, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {},
(PPO pid=1329306) Size of samples queue is: 13 taking up 3055796 bytes
(PPO pid=1329306) Size of samples queue is: 12 taking up 3082620 bytes
(PPO pid=1329306) Size of samples queue is: 11 taking up 2777366 bytes
(PPO pid=1329306) Size of samples queue is: 10 taking up 2468688 bytes
(PPO pid=1329306) Size of samples queue is: 9 taking up 2161178 bytes
(PPO pid=1329306) 2022-09-22 09:41:07,704 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 9 taking up 2445436 bytes
(PPO pid=1329306) 2022-09-22 09:41:08,523 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:09,291 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:09,416 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:41:10,082 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:10,862 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:11,637 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:12,074 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 20.0, 'episode_reward_mean': 109.9, 'episode_len_mean': 109.9, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {},
(PPO pid=1329306) Size of samples queue is: 16 taking up 3983514 bytes
(PPO pid=1329306) Size of samples queue is: 15 taking up 3702000 bytes
(PPO pid=1329306) 2022-09-22 09:41:13,202 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 14 taking up 3376138 bytes
(PPO pid=1329306) Size of samples queue is: 14 taking up 3393322 bytes
(PPO pid=1329306) Size of samples queue is: 13 taking up 3392646 bytes
(PPO pid=1329306) 2022-09-22 09:41:14,051 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:14,848 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:15,618 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:16,380 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:17,224 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:17,906 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:18,001 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 44.0, 'episode_reward_mean': 122.9, 'episode_len_mean': 122.9, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, '
(PPO pid=1329306) 2022-09-22 09:41:18,768 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 21 taking up 5242382 bytes
(PPO pid=1329306) Size of samples queue is: 20 taking up 4892136 bytes
(PPO pid=1329306) 2022-09-22 09:41:19,445 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 19 taking up 4588834 bytes
(PPO pid=1329306) Size of samples queue is: 18 taking up 4278040 bytes
(PPO pid=1329306) 2022-09-22 09:41:19,629 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 18 taking up 4583710 bytes
(PPO pid=1329306) 2022-09-22 09:41:20,470 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:21,264 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:22,033 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:22,850 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:23,646 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:24,054 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:24,416 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 23.0, 'episode_reward_mean': 106.7, 'episode_len_mean': 106.7, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, '
(PPO pid=1329306) Size of samples queue is: 25 taking up 6164844 bytes
(PPO pid=1329306) 2022-09-22 09:41:25,228 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 24 taking up 5857462 bytes
(PPO pid=1329306) Size of samples queue is: 24 taking up 5799654 bytes
(PPO pid=1329306) Size of samples queue is: 23 taking up 5518944 bytes
(PPO pid=1329306) Size of samples queue is: 22 taking up 5548008 bytes
(PPO pid=1329306) 2022-09-22 09:41:26,138 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:26,942 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:27,728 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:28,516 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:29,335 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:29,479 INFO policy_server_input.py:287 -- Sending worker weights to client.
PPO pid=1329306) 2022-09-22 09:41:30,097 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:30,274 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:30,848 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 13.0, 'episode_reward_mean': 126.0, 'episode_len_mean': 126.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 30 taking up 7393840 bytes
(PPO pid=1329306) Size of samples queue is: 29 taking up 7054478 bytes
(PPO pid=1329306) 2022-09-22 09:41:31,702 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 29 taking up 7071198 bytes
(PPO pid=1329306) Size of samples queue is: 28 taking up 6751612 bytes
(PPO pid=1329306) 2022-09-22 09:41:32,565 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 28 taking up 6780716 bytes
(PPO pid=1329306) 2022-09-22 09:41:33,430 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:34,199 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:34,959 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:35,715 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:36,549 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:36,869 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:37,319 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 95.0, 'episode_reward_mean': 171.0, 'episode_len_mean': 171.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:38,105 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 35 taking up 8615720 bytes
(PPO pid=1329306) Size of samples queue is: 35 taking up 8588076 bytes
(PPO pid=1329306) Size of samples queue is: 35 taking up 8320266 bytes
(PPO pid=1329306) 2022-09-22 09:41:39,153 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 35 taking up 8320350 bytes
(PPO pid=1329306) 2022-09-22 09:41:39,503 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 34 taking up 8210950 bytes
(PPO pid=1329306) 2022-09-22 09:41:40,397 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:41,172 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:41,951 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:42,712 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:43,490 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:43,950 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:44,254 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 20.0, 'episode_reward_mean': 131.3, 'episode_len_mean': 131.3, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:45,031 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 42 taking up 9837236 bytes
(PPO pid=1329306) Size of samples queue is: 41 taking up 9832900 bytes
(PPO pid=1329306) 2022-09-22 09:41:46,049 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 40 taking up 9552966 bytes
(PPO pid=1329306) Size of samples queue is: 39 taking up 9544050 bytes
(PPO pid=1329306) Size of samples queue is: 39 taking up 9510106 bytes
(PPO pid=1329306) 2022-09-22 09:41:47,195 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:48,012 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:48,785 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:49,568 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:49,661 INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:41:50,349 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:51,010 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:51,127 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:51,889 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 17.0, 'episode_reward_mean': 177.8, 'episode_len_mean': 177.8, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 47 taking up 11367346 bytes
(PPO pid=1329306) 2022-09-22 09:41:52,697 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 47 taking up 11347314 bytes
(PPO pid=1329306) Size of samples queue is: 46 taking up 11015872 bytes
(PPO pid=1329306) 2022-09-22 09:41:53,520 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 46 taking up 11351658 bytes
(PPO pid=1329306) 2022-09-22 09:41:54,407 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 46 taking up 11068096 bytes
(PPO pid=1329306) 2022-09-22 09:41:55,263 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:56,064 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:56,842 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:57,657 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:58,415 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:58,801 INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:59,196 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:59,664 INFO policy_server_input.py:287 -- Sending worker weights to client.
Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 15.0, 'episode_reward_mean': 163.0, 'episode_len_mean': 163.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:59,975 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 53 taking up 12941512 bytes
(PPO pid=1329306) Size of samples queue is: 54 taking up 12868308 bytes
(PPO pid=1329306) 2022-09-22 09:42:01,233 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 53 taking up 12905820 bytes
(PPO pid=1329306) Size of samples queue is: 53 taking up 12868908 bytes
(PPO pid=1329306) 2022-09-22 09:42:02,313 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 52 taking up 12635894 bytes
(PPO pid=1329306) 2022-09-22 09:42:03,327 INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.