Ray head stuck on ssh when implementing Cloudwatch

Hi doyen and welcome to the Ray community!

Since it connects okay without issues once Cloudwatch is removed I’m guessing it’s an issue with the integration between Cloudwatch <> Ray <> AWS?

  1. Does your Ray cluster have the proper IAM permissions from AWS to talk to the Cloudwatch instance? Mostly create/write permissions to allow nodes to log to CloudWatch.
  2. Are there any network or firewall restrictions to Cloudwatch?
  3. Does the security group related with Ray allow ssh access and the key pairing is working?

Are there any error messages or is it just stuck with waiting-for-ssh?
Also, is there any code where we can reproduce this issue? Can you paste your updated YAML config (make sure you censor out any sensitive info tho)!

Here’s a few other folks who have run into this issue (albeit not with CloudWatch specifically), maybe it can help debug. Sorry I wasn’t more help!