How to create AWS Cluster on Specific VPC?

Hi, I’m new to Ray and exploring whether it can be used with some existing AWS infrastructure. The broad view of what I’m trying to do is set up a cluster that can connect to an existing EFS drive. This EFS is located on one of two VPCs in my account. This particular VPC has several public and private subnets for each of the availability zones within us-west-2, but I can’t seem to figure out how to set up the cluster on that VPC.

My first naive attempt:

cluster_name: minimal
provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

This successfully creates the cluster, assigns a public IP address, and automatically creates a new security group, but places it on the wrong VPC.

So, I tried to define the specific subnet IDs, using the public and private subnets for us-west-2c:


provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

available_node_types:
  ray.head.default:
    node_config:
      InstanceType: m5a.large
      SubnetIds: [subnet-USWEST2CPUBLIC, subnet-USWEST2CPRIVATE]

head_node_type: ray.head.default

This does create an EC2 instance on the correct VPC, in the public subnet, with an automatically generated security group, however no public IP address gets assigned and the setup times out:

Launched 1 nodes [subnet_id=subnet-USWEST2CPUBLIC]
    Launched instance i-XXX [state=pending, info=pending]
  Launched a new head node
  Fetching the new head node

<1/1> Setting up head node
  Prepared bootstrap config
  New status: waiting-for-ssh
  [1/7] Waiting for SSH to become available
    Running `uptime` as a test.
    Waiting for IP
      Not yet available, retrying in 5 seconds

From here, I tried setting up the head node with a network interface with a public facing IP (following the AWS docs for instance creation):

cluster_name: minimal

provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

available_node_types:
  ray.head.default:
    node_config:
      InstanceType: m5a.large
      NetworkInterfaces:
        - AssociatePublicId: True
          SubnetId: subnet-USWEST2CPUBLIC
          Groups: [sg-EXISTINGSECURITYGROUP]

head_node_type: ray.head.default

However, this failed with the following error message:

> Checking AWS environment settings
> No usable subnets found, try manually creating an instance in your specified region to populate the list of subnets and trying this again.
> Note that the subnet must map public IPs on instance launch unless you set `use_internal_ips: true` in the `provider` config.

I can’t see why this subnet won’t work; I tried switching the security group to the one auto generated by Ray and it still gave the same error. I’m not sure how else to set the security group; there wasn’t anything else listed in the Ray schema or the AWS boto3 doc.

Am I misunderstanding AWS or Ray? Am I going about this the wrong way?

cc @Alex_Wu. Can you please answer ^?

I think I figured out the workaround…I had to manually pre-create a security group in the VPC, and specify a specific SubnetId and the created security group (in the Groups list) under the NetworkInterfaces entry under node_config; this was enough to get Ray to place the node properly. Perhaps the docs can be cleaned up to make this more explicit.