OOM when I decoupled ray from GPTj finetune script

I replicated gptj_ray with batch size = 48 on 8xH100 and able to run without any error. Later, I decoupled all ray related functionality and tried to run without ray with Batch size=1 on same server, but I am keep getting OOM. I kept same config of deepspeed and other parameters except dataset and trainer. I am trying to understand that Does Ray do anything specific to get pass OOM internally and How could I successfully run?