Cannot connect to head node on AWS

I am trying to do a small experiment where I am trying to connect to a worker on my laptop to a head node running on AWS EC2. For some reason there’s no progress after it prints the local node IP. The command I am using ray start --address=<Public_IP:6379> --redis-password=<pwd>

This is the error I am getting:

service_based_gcs_client.cc:248: Couldn't reconnect to GCS server. The last attempted GCS server address was 172.31.8.200:42563
*** StackTrace Information ***
    @     0x7ffa8fa267b5  google::GetStackTraceToString()
    @     0x7ffa8f9f545e  ray::GetCallTrace()
    @     0x7ffa8fa1a424  ray::SpdLogMessage::Flush()
    @     0x7ffa8fa1a59d  ray::RayLog::~RayLog()
    @     0x7ffa8f6af611  ray::gcs::ServiceBasedGcsClient::ReconnectGcsServer()
    @     0x7ffa8f6af74d  ray::gcs::ServiceBasedGcsClient::GcsServiceFailureDetected()
    @     0x7ffa8f6b803f  _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19GetAllNodeInfoReplyEEZNS4_12GcsRpcClient14GetAllNodeInfoERKNS4_21GetAllNodeInfoRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
    @     0x7ffa8f6be9d5  ray::rpc::ClientCallImpl<>::OnReplyReceived()
    @     0x7ffa8f57699b  _ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7ffa8f9c6238  boost::asio::detail::completion_handler<>::do_complete()
    @     0x7ffa8fac3bf1  boost::asio::detail::scheduler::do_run_one()
    @     0x7ffa8fac3d21  boost::asio::detail::scheduler::run()
    @     0x7ffa8fac5820  boost::asio::io_context::run()
    @     0x7ffa8f55d4ac  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN3ray3gcs19GlobalStateAccessorC4ERKSsS7_EUlvE_EEEEE6_M_runEv
    @     0x7ffa8fd64a40  execute_native_thread_routine
    @     0x7ffa913156db  start_thread
    @     0x7ffa9103e71f  clone

I tried this from another EC2 instance same thing. I have already enabled port 6379. Am I missing out on something? Please let me know as I am knew to this

There is a good doc page about that.

The doc is for the latest master but in most of the cases that is still fine but here is the one for the latest release. On the head you also need to enable the ports for all nodes (like node-manager, objectmanager, worker-ports), however if you are running in aws, you might want to use the cluster launcher: Launching Cloud Clusters — Ray v1.4.1, I think that would make deployment much easier than a manual, startup

1 Like

Thanks. This is what I was missing

1 Like