Rllib server-client mode slow down when real client number less than given num_workers

I just found when I work in server-client mode, the working client node number must equal num_workers given in server start up, otherwise, synchronous_parallel_sample() in rollout_ops.py would take much longer time.
I do think this is not reasonable and should be optmized.