Tensor parallelism with torch run inside ray

Hi,

Im using the gpt-fast library, and im doing multi gpu inference like this.

torchrun --standalone --nproc_per_node=8 generate.py --checkpoint_path llama-3-70b-instruct-hf-pt/model.pth

How can I do this across multiple nodes inside ray?