I would like to know more about how can i distribute fine tuning of a large model via AIR,
Assume ı want to fine tune falcon 40B, How should my cluster look like ?
Does the cluster require the model to fit in a single gpu ? if not then how should my code look like in head node and other nodes ?
How would you compare hivemind and ray in terms of llm fine tuning distribution?
@risedangel Does this article speak to the problem you trying to address by serving an LLM using Ray Serve?
@risedangel One rule of dumb to ensure you have enough collective memory across your cluster of GPU GRAM, make sure that you have at least xxB * 2GB . That is, for a 40B Falcon, you have at least
40 & 2GB = 80 GB of GRAM for GPU available across all nodes.