Behavior Cloning meets OOM (Out of Memory)

Hi guys, I’m trying to begin training with behavior cloning. However, it meets OOM (out of memeory) failures each time. I’ve noticed that the memory usage keep growing and meets OOM during training, even in ‘CartPole-v0’ demo envs with 2 workers. However, PPO Trainer have no such problem.

I would be appreciated if someone know the reason behind that and help me out of this. Here is my failure outputs during training.

Current time: 2022-02-24 15:50:39 (running for 00:31:58.62)
Memory usage on this node: 13.4/94.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/80 CPUs, 1.0/2 GPUs, 0.0/52.66 GiB heap, 0.0/26.33 GiB objects (0.0/1.0 accelerator_type:TITAN)
Result logdir: /my_path/result_logs/MARWIL
Number of trials: 1/1 (1 RUNNING)
+---------------------------+----------+---------------------+--------+------------------+---------+----------+----------------------+----------------------+--------------------+
| Trial name                | status   | loc                 |   iter |   total time (s) |      ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|---------------------------+----------+---------------------+--------+------------------+---------+----------+----------------------+----------------------+--------------------|
| MARWIL_my_env_fe3f7_00000 | RUNNING  | XXXXXXXXX |   3053 |          1814.98 | 1605337 | -1.34783 |                    1 |                   -5 |               3001 |
+---------------------------+----------+---------------------+--------+------------------+---------+----------+----------------------+----------------------+--------------------+
ValueError: CUDA out of memory. Tried to allocate 74.00 MiB (GPU 0; 23.65 GiB total capacity; 21.74 GiB already allocated; 72.56 MiB free; 22.70 GiB reserved in total by PyTorch)