Memory Requirements for distributing LLM

risedangel · August 18, 2023, 10:01am

Hello,
I would like to know more about how can i distribute fine tuning of a large model via AIR,
Assume ı want to fine tune falcon 40B, How should my cluster look like ?
Does the cluster require the model to fit in a single gpu ? if not then how should my code look like in head node and other nodes ?

risedangel · August 18, 2023, 10:02am

How would you compare hivemind and ray in terms of llm fine tuning distribution?

Jules_Damji · August 29, 2023, 8:44pm

@risedangel Does this article speak to the problem you trying to address by serving an LLM using Ray Serve?

https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html

Jules_Damji · August 31, 2023, 4:22pm

@risedangel One rule of dumb to ensure you have enough collective memory across your cluster of GPU GRAM, make sure that you have at least xxB * 2GB . That is, for a 40B Falcon, you have at least
40 & 2GB = 80 GB of GRAM for GPU available across all nodes.

Topic		Replies	Views
LLM model loading	0	58	July 29, 2024
Does ray-llm support only CPU?	0	406	October 25, 2023
Serving LLM with multiple gpus Ray Serve	0	264	July 3, 2024
Best practices to run multiple models in multiple GPUs in RayLLM Ray Train	0	711	February 8, 2024
Multi GPU Usage on Multi VM\|Ray cluster on multi VM instances Ray Clusters	5	1372	January 17, 2025

Memory Requirements for distributing LLM

Related topics