Ok. Thank you very much for your help. So far I have found a solution that, although not optimal, can temporarily solve my problem. I split the original dataset into several smaller datasets, and each of them have less blocks, and then sequentially run these datasets in parallel by blocks.
If I have better solution, I will share it here.
Very happy to be here to discuss technology with you.