How to use py-spy on a ray cluster?


I have a ray cluster with two workers and I am trying to get ray training working. It is hanging and I would like to get to the bottom of it.

In Ray Train hangs for long time, @kai mentioned using py-spy. Where exactly should I run the py-spy command?

Best is to do something like

ps a | grep ray

The output could be something like this:

19790 s001  SN+    0:01.86 ray::IDLE              
19791 s001  SN+    0:01.85 ray::IDLE              
19857 s001  SN+    0:01.07 ray::Actor

you can then do py-spy on the PID of your ray worker (e.g. the “Actor” class) (usually needs sudo)

> sudo py-spy dump --pid 19857
Process 19857: ray::Actor              
Python v3.7.7 (/Users/kai/.pyenv/versions/3.7.7/bin/python3.7)

Thread 0x1134DB600 (idle): "MainThread"
    main_loop (ray/_private/
    <module> (ray/_private/workers/
Thread 0x70000EFBC000 (idle): "ray_import_thread"
    wait (
    _wait_once (grpc/
    wait (grpc/
    result (grpc/
    _poll_locked (ray/_private/
    poll (ray/_private/
    _run (ray/_private/
    run (
    _bootstrap_inner (
    _bootstrap (
Thread 0x70000F4BF000 (idle): "Thread-1"
    channel_spin (grpc/
    run (
    _bootstrap_inner (
    _bootstrap (
1 Like