Memory issue debugging

I am running a server client setup with (built following the cartpole server client example) with a multi-agent environment. I have three machines in our network running one client script each with local inference and one additional machine running the training server script.

During training with tune (and also with normal trainer) the RAM on the training server fills up until it crashes. So i tried using different configurations with these results:

Screenshot from 2022-09-15 09-59-30
gray line: all 3 client machines, blue line: 1 client machine, pink line: server and client running on same machine

My next step was using the MemoryTrackingCallbacks to check what could cause the issue. Since in a server/client setup the config gets sent to the clients the callback seemed to only read data from the clients (which showed no problems in their memory stats). To get the MemoryTrackingCallbacks for the server i tried to run client and server on the same machine which gave me a bunch of things that seemed to be increasing linear. Most notably:

  • rllib/policy/sample_batch.py (see screenshot 1 from tensorboard at bottom)
  • rllib/evaluation/episode.py (see screenshot 2 from tensorboard at bottom)

When running my env with a dummy script that produces random actions for 1000 episodes i didn’t run into memory issues so this doesn’t seem to be the problem. So my best guess so far is that the server isn’t fast enough to learn from the incoming data.

If anyone has advice how to check if this is the case or for what other thins i could check i’d be very grateful. Also i would be interested if there is a best practice when working in a service client to balance sample generation and learning speed.

Screenshot from 2022-09-15 10-16-39

Screenshot from 2022-09-15 10-17-14

I did run a test with the cartpole client/server scripts in my setup and it shows the same tendency although not that fast (see screenshot). The reason is probably that my policies use vision networks and have to process image data observations. So my question is, if there is ways to “load-balance” sample generation and learning in a server/client type setup.

Screenshot from 2022-09-15 11-00-52

Hi @Blubberblub ,

We don’t have such load balancing techniques. But independently of your observation space, memory should not simply grow infinitely unless your are using a very large replay buffer and observations are very large. Could you check that? Maybe check if it would look different with a minimal env?

Cheers

Hi @Blubberblub,

My guess is that your environment is generating samples much faster than the training is consuming them. This is causing the sample queue to fill up in the policy server which is causing the memory issue.

Try adding the following print and see if this is growing:

    @override(InputReader)
    def next(self):
        print(f"Size of samples queue is: {self.samples_queue.qsize()}", )
        return self.samples_queue.get()

My env is not much faster than the training but if I artificially slow training down by putting a break point in the training call of the policy and waiting 20 seconds I see something like this:

Size of samples queue is: 6
Size of samples queue is: 5
Size of samples queue is: 5
Size of samples queue is: 5
Size of samples queue is: 4
Size of samples queue is: 3
Size of samples queue is: 4
Size of samples queue is: 3
Size of samples queue is: 2
Size of samples queue is: 1
Size of samples queue is: 1
Size of samples queue is: 0
Size of samples queue is: 0
Size of samples queue is: 0
Size of samples queue is: 0
1 Like

@Blubberblub,

If you wanted to get the size of the data in the queue as well you could use this instead but it is blocking the queue so it will slow things down. Probably acceptable for debugging. I adapted a recursive version to get the size of the queue and its contents from here: Get size of Python object recursively to handle size of containers within containers · GitHub

    def get_size(self, obj, seen=None):
        """Recursively finds size of objects"""
        import sys
        size = sys.getsizeof(obj)
        if seen is None:
            seen = set()

        obj_id = id(obj)
        if obj_id in seen:
            return 0

        # Important mark as seen *before* entering recursion to gracefully handle
        # self-referential objects
        seen.add(obj_id)

        if isinstance(obj, dict):
            size += sum([self.get_size(v, seen) for v in obj.values()])
            size += sum([self.get_size(k, seen) for k in obj.keys()])
        elif hasattr(obj, '__dict__'):
            size += self.get_size(obj.__dict__, seen)
        elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
            size += sum([self.get_size(i, seen) for i in obj])

        return size

    @override(InputReader)
    def next(self):
        import sys
        with self.samples_queue.mutex:
            size = self.get_size(self.samples_queue.queue)
        print(f"Size of samples queue is: {self.samples_queue.qsize()} taking up {size} bytes", )
        return self.samples_queue.get()
1 Like

@arturn,

I am seeing the issue he mentioned with cartpole server client example.

Running with following CLI args: Namespace(as_test=False, callbacks_verbose=False, framework='torch', local_mode=False, no_restore=False, no_tune=False, num_cpus=3, num_workers=0, port=9900, run='PPO', stop_iters=200, stop_reward=99999999.0, stop_timesteps=500000, use_lstm=False)
Usage stats collection is disabled.
Started a local Ray instance. View the dashboard at http://127.0.0.1:8265.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:46,938	INFO rollout_worker.py:838 -- Completed sample batch:
(PPO pid=1329306) 2022-09-22 09:40:49,371	INFO policy_server_input.py:284 -- Sending worker creation args to client.
(PPO pid=1329306) 2022-09-22 09:40:49,408	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:50,230	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:51,006	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 0 taking up 624 bytes
(PPO pid=1329306) 2022-09-22 09:40:51,777	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:52,559	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:52,560	WARNING deprecation.py:47 -- DeprecationWarning: `concat_samples` has been deprecated. Use `concat_samples() from rllib.policy.sample_batch` instead. This will raise an error in the future!
(PPO pid=1329306) 2022-09-22 09:40:53,365	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
Number of trials: 1/1 (1 RUNNING)
(PPO pid=1329306) 2022-09-22 09:40:54,170	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:54,961	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:55,748	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:56,525	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 50.0, 'episode_reward_min': 9.0, 'episode_reward_mean': 29.1, 'episode_len_mean': 29.1, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 7 taking up 1550090 bytes


(PPO pid=1329306) 2022-09-22 09:40:57,363	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 6 taking up 1520294 bytes
(PPO pid=1329306) Size of samples queue is: 6 taking up 1531410 bytes
(PPO pid=1329306) Size of samples queue is: 5 taking up 1240460 bytes
(PPO pid=1329306) Size of samples queue is: 4 taking up 931014 bytes
(PPO pid=1329306) 2022-09-22 09:40:58,196	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:58,984	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:40:59,412	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:40:59,796	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:00,596	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:01,356	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:01,759	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:02,113	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 11 taking up 2475412 bytes
(PPO pid=1329306) Size of samples queue is: 10 taking up 2447516 bytes
(PPO pid=1329306) Size of samples queue is: 9 taking up 2165654 bytes
(PPO pid=1329306) Size of samples queue is: 8 taking up 1855696 bytes
(PPO pid=1329306) Size of samples queue is: 7 taking up 1545866 bytes
(PPO pid=1329306) 2022-09-22 09:41:02,885	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:03,672	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:04,476	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:05,267	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,099	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,893	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:06,894	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 141.0, 'episode_reward_min': 10.0, 'episode_reward_mean': 40.1, 'episode_len_mean': 40.1, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 
(PPO pid=1329306) Size of samples queue is: 13 taking up 3055796 bytes
(PPO pid=1329306) Size of samples queue is: 12 taking up 3082620 bytes
(PPO pid=1329306) Size of samples queue is: 11 taking up 2777366 bytes
(PPO pid=1329306) Size of samples queue is: 10 taking up 2468688 bytes
(PPO pid=1329306) Size of samples queue is: 9 taking up 2161178 bytes
(PPO pid=1329306) 2022-09-22 09:41:07,704	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 9 taking up 2445436 bytes
(PPO pid=1329306) 2022-09-22 09:41:08,523	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:09,291	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:09,416	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:41:10,082	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:10,862	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:11,637	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:12,074	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 20.0, 'episode_reward_mean': 109.9, 'episode_len_mean': 109.9, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 
(PPO pid=1329306) Size of samples queue is: 16 taking up 3983514 bytes
(PPO pid=1329306) Size of samples queue is: 15 taking up 3702000 bytes
(PPO pid=1329306) 2022-09-22 09:41:13,202	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 14 taking up 3376138 bytes
(PPO pid=1329306) Size of samples queue is: 14 taking up 3393322 bytes
(PPO pid=1329306) Size of samples queue is: 13 taking up 3392646 bytes
(PPO pid=1329306) 2022-09-22 09:41:14,051	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:14,848	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:15,618	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:16,380	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:17,224	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:17,906	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:18,001	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 44.0, 'episode_reward_mean': 122.9, 'episode_len_mean': 122.9, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, '
(PPO pid=1329306) 2022-09-22 09:41:18,768	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 21 taking up 5242382 bytes
(PPO pid=1329306) Size of samples queue is: 20 taking up 4892136 bytes
(PPO pid=1329306) 2022-09-22 09:41:19,445	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 19 taking up 4588834 bytes
(PPO pid=1329306) Size of samples queue is: 18 taking up 4278040 bytes
(PPO pid=1329306) 2022-09-22 09:41:19,629	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 18 taking up 4583710 bytes
(PPO pid=1329306) 2022-09-22 09:41:20,470	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:21,264	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:22,033	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:22,850	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:23,646	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:24,054	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:24,416	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 23.0, 'episode_reward_mean': 106.7, 'episode_len_mean': 106.7, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, '
(PPO pid=1329306) Size of samples queue is: 25 taking up 6164844 bytes
(PPO pid=1329306) 2022-09-22 09:41:25,228	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 24 taking up 5857462 bytes
(PPO pid=1329306) Size of samples queue is: 24 taking up 5799654 bytes
(PPO pid=1329306) Size of samples queue is: 23 taking up 5518944 bytes
(PPO pid=1329306) Size of samples queue is: 22 taking up 5548008 bytes
(PPO pid=1329306) 2022-09-22 09:41:26,138	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:26,942	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:27,728	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:28,516	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:29,335	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:29,479	INFO policy_server_input.py:287 -- Sending worker weights to client.
PPO pid=1329306) 2022-09-22 09:41:30,097	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:30,274	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:30,848	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 13.0, 'episode_reward_mean': 126.0, 'episode_len_mean': 126.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 30 taking up 7393840 bytes
(PPO pid=1329306) Size of samples queue is: 29 taking up 7054478 bytes
(PPO pid=1329306) 2022-09-22 09:41:31,702	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 29 taking up 7071198 bytes
(PPO pid=1329306) Size of samples queue is: 28 taking up 6751612 bytes
(PPO pid=1329306) 2022-09-22 09:41:32,565	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 28 taking up 6780716 bytes
(PPO pid=1329306) 2022-09-22 09:41:33,430	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:34,199	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:34,959	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:35,715	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:36,549	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:36,869	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:37,319	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 95.0, 'episode_reward_mean': 171.0, 'episode_len_mean': 171.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:38,105	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 35 taking up 8615720 bytes
(PPO pid=1329306) Size of samples queue is: 35 taking up 8588076 bytes
(PPO pid=1329306) Size of samples queue is: 35 taking up 8320266 bytes
(PPO pid=1329306) 2022-09-22 09:41:39,153	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 35 taking up 8320350 bytes
(PPO pid=1329306) 2022-09-22 09:41:39,503	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) Size of samples queue is: 34 taking up 8210950 bytes
(PPO pid=1329306) 2022-09-22 09:41:40,397	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:41,172	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:41,951	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:42,712	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:43,490	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:43,950	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:44,254	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 20.0, 'episode_reward_mean': 131.3, 'episode_len_mean': 131.3, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:45,031	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 42 taking up 9837236 bytes
(PPO pid=1329306) Size of samples queue is: 41 taking up 9832900 bytes
(PPO pid=1329306) 2022-09-22 09:41:46,049	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 40 taking up 9552966 bytes
(PPO pid=1329306) Size of samples queue is: 39 taking up 9544050 bytes
(PPO pid=1329306) Size of samples queue is: 39 taking up 9510106 bytes
(PPO pid=1329306) 2022-09-22 09:41:47,195	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:48,012	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:48,785	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:49,568	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:49,661	INFO policy_server_input.py:287 -- Sending worker weights to client.
(PPO pid=1329306) 2022-09-22 09:41:50,349	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:51,010	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:51,127	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:51,889	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 17.0, 'episode_reward_mean': 177.8, 'episode_len_mean': 177.8, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) Size of samples queue is: 47 taking up 11367346 bytes
(PPO pid=1329306) 2022-09-22 09:41:52,697	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 47 taking up 11347314 bytes
(PPO pid=1329306) Size of samples queue is: 46 taking up 11015872 bytes
(PPO pid=1329306) 2022-09-22 09:41:53,520	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 46 taking up 11351658 bytes
(PPO pid=1329306) 2022-09-22 09:41:54,407	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 46 taking up 11068096 bytes
(PPO pid=1329306) 2022-09-22 09:41:55,263	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:56,064	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:56,842	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:57,657	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:58,415	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:58,801	INFO algorithm.py:798 -- Evaluating current policy for 10 episodes.
(PPO pid=1329306) 2022-09-22 09:41:59,196	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) 2022-09-22 09:41:59,664	INFO policy_server_input.py:287 -- Sending worker weights to client.

Trial PPO_None_25ffe_00000 reported evaluation={'episode_reward_max': 200.0, 'episode_reward_min': 15.0, 'episode_reward_mean': 163.0, 'episode_len_mean': 163.0, 'episode_media': {}, 'episodes_this_iter': 10, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, (PPO pid=1329306) 2022-09-22 09:41:59,975	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 53 taking up 12941512 bytes
(PPO pid=1329306) Size of samples queue is: 54 taking up 12868308 bytes
(PPO pid=1329306) 2022-09-22 09:42:01,233	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 53 taking up 12905820 bytes
(PPO pid=1329306) Size of samples queue is: 53 taking up 12868908 bytes
(PPO pid=1329306) 2022-09-22 09:42:02,313	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
(PPO pid=1329306) Size of samples queue is: 52 taking up 12635894 bytes
(PPO pid=1329306) 2022-09-22 09:42:03,327	INFO policy_server_input.py:355 -- Got sample batch of size 1000 from client.
1 Like

@arturn Thanks for the reply. My obs_dict currently contains 47 agents with 41x41x1 (uint8) observation for each agent. I created a hierarchical multi-agent env that has 47 top level agents and 47 low-lvl agents that i „respawn“ with a new agent_id after every top level step. My pattern is: 1 top level step, 10 low level steps (or less when all are done early), 1 top level step, 10 low level steps and so on, for a maximum of 10 top level steps. I paid attention to setting done:True for all low level agents after their steps as well as for the top level agents at the end of the episode, because my first guess was that this could be causing the issue. For the algorithm i used the standard PPO trainer from rllib with a vision net. I did not change the replay buffer so it is a possibility that the standard implementation is not suited for my environment.

@mannyv Thanks a lot for your providing an approach to checking the sample queue and for trying it out with the cartpole server/client. I had no idea how to do that before!I will implement it and give feedback as soon as I’m back from vacation.

1 Like

Got it. PPO does not have a replay buffer, so you can disregard my comment and go with @mannyv 's approach! Thanks @mannyv