How to use py-spy on a ray cluster?

Hey.

I have a ray cluster with two workers and I am trying to get ray training working. It is hanging and I would like to get to the bottom of it.

In Ray Train hangs for long time, @kai mentioned using py-spy. Where exactly should I run the py-spy command?

Best is to do something like

ps a | grep ray

The output could be something like this:

...
19790 s001  SN+    0:01.86 ray::IDLE              
19791 s001  SN+    0:01.85 ray::IDLE              
19857 s001  SN+    0:01.07 ray::Actor

you can then do py-spy on the PID of your ray worker (e.g. the “Actor” class) (usually needs sudo)

> sudo py-spy dump --pid 19857
Password:
Process 19857: ray::Actor              
Python v3.7.7 (/Users/kai/.pyenv/versions/3.7.7/bin/python3.7)

Thread 0x1134DB600 (idle): "MainThread"
    main_loop (ray/_private/worker.py:754)
    <module> (ray/_private/workers/default_worker.py:237)
Thread 0x70000EFBC000 (idle): "ray_import_thread"
    wait (threading.py:300)
    _wait_once (grpc/_common.py:106)
    wait (grpc/_common.py:148)
    result (grpc/_channel.py:735)
    _poll_locked (ray/_private/gcs_pubsub.py:249)
    poll (ray/_private/gcs_pubsub.py:385)
    _run (ray/_private/import_thread.py:70)
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)
Thread 0x70000F4BF000 (idle): "Thread-1"
    channel_spin (grpc/_channel.py:1258)
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)
1 Like