Ray tune + deepspeed integration

I am trying to tune my LLM with HuggingFace Trainer.hyperparameter_search() together with the Ray backend. The model I am using has 13B params, so it is impossible to run trials without sharing the optimizer across multiple GPUs (8 GPUs per trial in my case, max concurrent trials = 1).

I tried to add Deepspeed ZeRO-2 config to Trainer, but without success - it always crashes with module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu error. I guess that this error originates from how Ray starts trials. I also tried to start my tune with deepspeed/accelerate [args] tune_script.py instead of python tune_script.py.

Is it possible to tune hyperparams with HF + Ray + Deepspeed? Or maybe other ways like using FSDP…

Hi there! Welcome to the Ray community. Can you tell me more about your CUDA setup or the lines of code where it’s erroring out on? It seems like the module is expecting a CUDA because that’s what the device_id[0] has it on, but instead it seems like it’s trying to use the cpu. What GPU are you running this on?