Help debugging a memory leak in rllib

I added more GPUs, increased worker_num, and rewrote policy_client myself.
In fact, if you don’t want to make full use of the machine’s cpu, you don’t need to reimplement the policy_client class

@hridayns mannyv provided some advise and code on how to check the sample queue in this question if you are interested: Memory issue debugging