Running torch profiler

Previously there were ways to call the torch profiler as in:

However, the docs say that Ray 2.5.0 does not natively support any GPU profiling. Was there additional context around removing this capability? I’d be interested in contributing a feature if it enables tracing for code using Ray.

cc @matthewdeng Could you share some context here? Thank you!

Hey! You can still utilize Torch profiling directly in your Torch training loop, and access the created file(s) after training.

Could you share more about the type of tracing you would you be interested in adding for Ray?

cc @Huaiwei_Sun

I’m trying to use ray for metrics while launching a vanilla torch ddp train script launched with torchrun but I’m running into NCCL errors when doing ray.init() and torch.distributed.init_process_group() in the same program:

For context, I’m trying to write an HTTP Server (actor) to be able to trigger PyTorch traces remotely.

However, the docs say that Ray 2.5.0 does not natively support any GPU profiling. Was there additional context around removing this capability?

PR to fix the doc

According to @matthewdeng , PyTorch Profiler should work out of box when you use Ray Train with it.
Are you using Ray Train?