Not able to connect to ray head node with remote ray worker

Hi,

We have created ray nodes as docker containers and in local we can create the cluster by adding --address :6379 and it works.

We have a use case where the ray head node and ray worker nodes are running a different machines (VMs) at different locations.

We tried to use reverse tunneling like frp to connect to the ray head node but it is not working.

Our flow is like below -

client → frps at 5002 → frpc at 6379

and when we try to connect to the ray head node we get the below stacktrace -

ray start --address=127.0.0.1:9004
Local node IP: 192.168.65.3
[2024-04-12 02:53:02,778 C 86 86] gcs_client.cc:153: Check failed: (left != right) 0 vs 0
*** StackTrace Information ***
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0xefd4f8) [0xffffbdef24f8] ray::operator<<()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0xefec48) [0xffffbdef3c48] ray::SpdLogMessage::Flush()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x38) [0xffffbdef4028] ray::RayLog::~RayLog()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0x828800) [0xffffbd81d800] ray::gcs::(anonymous namespace)::HandleGcsError()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3gcs15PythonGcsClient7ConnectERKNS_9ClusterIDElm+0x328) [0xffffbd822c88] ray::gcs::PythonGcsClient::Connect()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0x5b3f0c) [0xffffbd5a8f0c] __pyx_pw_3ray_7_raylet_9GcsClient_3_connect()
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x131360) [0xffffbf0f6360]
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0x547574) [0xffffbd53c574] __Pyx__PyObject_CallOneArg()
/usr/local/lib/python3.10/site-packages/ray/_raylet.so(+0x5e6564) [0xffffbd5db564] __pyx_tp_new_3ray_7_raylet_GcsClient()
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x12bc88) [0xffffbf0f0c88]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x14c) [0xffffbf0f0b90] _PyObject_MakeTpCall
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4ba0) [0xffffbf0eb8d0] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x68c) [0xffffbf0e73bc] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x134) [0xffffbf0f0294] _PyObject_FastCallDictTstate
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x13954c) [0xffffbf0fe54c]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x12bda8) [0xffffbf0f0da8]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x14c) [0xffffbf0f0b90] _PyObject_MakeTpCall
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4ba0) [0xffffbf0eb8d0] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(PyVectorcall_Call+0xdc) [0xffffbf166810] PyVectorcall_Call
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x207c) [0xffffbf0e8dac] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(PyVectorcall_Call+0xdc) [0xffffbf166810] PyVectorcall_Call
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x207c) [0xffffbf0e8dac] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x13ae34) [0xffffbf0ffe34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(PyVectorcall_Call+0xdc) [0xffffbf166810] PyVectorcall_Call
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x207c) [0xffffbf0e8dac] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x68c) [0xffffbf0e73bc] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x68c) [0xffffbf0e73bc] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x12f568) [0xffffbf0f4568]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x13aed0) [0xffffbf0ffed0]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x207c) [0xffffbf0e8dac] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x94) [0xffffbf0f01f4] _PyObject_FastCallDictTstate
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_Call_Prepend+0x7c) [0xffffbf0fe7bc] _PyObject_Call_Prepend
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x1e8218) [0xffffbf1ad218]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x7c) [0xffffbf0f0ac0] _PyObject_MakeTpCall
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x3eec) [0xffffbf0eac1c] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x830) [0xffffbf0e7560] _PyEval_EvalFrameDefault
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x120e34) [0xffffbf0e5e34]
/usr/local/bin/…/lib/libpython3.10.so.1.0(PyEval_EvalCode+0x84) [0xffffbf1622a8] PyEval_EvalCode
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x1d1eec) [0xffffbf196eec]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x1ce174) [0xffffbf193174]
/usr/local/bin/…/lib/libpython3.10.so.1.0(+0x1cd130) [0xffffbf192130]
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyRun_SimpleFileObject+0x188) [0xffffbf191bec] _PyRun_SimpleFileObject
/usr/local/bin/…/lib/libpython3.10.so.1.0(_PyRun_AnyFileObject+0x50) [0xffffbf191970] _PyRun_AnyFileObject
/usr/local/bin/…/lib/libpython3.10.so.1.0(Py_RunMain+0x1d0) [0xffffbf190ac0] Py_RunMain
/usr/local/bin/…/lib/libpython3.10.so.1.0(Py_BytesMain+0x38) [0xffffbf153108] Py_BytesMain
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0xffffbee71e18] __libc_start_main
/usr/local/bin/python(+0x8c4) [0xaaaacba608c4]

Can anyone suggest us a solution here or any other method to connect to ray head node from the worker running in a remote VM or machine.

Any support or clue will be highly appreciated and we will mention the credits also for this.

Thanks,

Arpit Nigam
arpitnigam2020@gmail.com