How to create AWS Cluster on Specific VPC?

mattalive · February 17, 2022, 9:44pm

Hi, I’m new to Ray and exploring whether it can be used with some existing AWS infrastructure. The broad view of what I’m trying to do is set up a cluster that can connect to an existing EFS drive. This EFS is located on one of two VPCs in my account. This particular VPC has several public and private subnets for each of the availability zones within us-west-2, but I can’t seem to figure out how to set up the cluster on that VPC.

My first naive attempt:

cluster_name: minimal
provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

This successfully creates the cluster, assigns a public IP address, and automatically creates a new security group, but places it on the wrong VPC.

So, I tried to define the specific subnet IDs, using the public and private subnets for us-west-2c:


provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

available_node_types:
  ray.head.default:
    node_config:
      InstanceType: m5a.large
      SubnetIds: [subnet-USWEST2CPUBLIC, subnet-USWEST2CPRIVATE]

head_node_type: ray.head.default

This does create an EC2 instance on the correct VPC, in the public subnet, with an automatically generated security group, however no public IP address gets assigned and the setup times out:

Launched 1 nodes [subnet_id=subnet-USWEST2CPUBLIC]
    Launched instance i-XXX [state=pending, info=pending]
  Launched a new head node
  Fetching the new head node

<1/1> Setting up head node
  Prepared bootstrap config
  New status: waiting-for-ssh
  [1/7] Waiting for SSH to become available
    Running `uptime` as a test.
    Waiting for IP
      Not yet available, retrying in 5 seconds

From here, I tried setting up the head node with a network interface with a public facing IP (following the AWS docs for instance creation):

cluster_name: minimal

provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2c
  cache_stopped_nodes: False

available_node_types:
  ray.head.default:
    node_config:
      InstanceType: m5a.large
      NetworkInterfaces:
        - AssociatePublicId: True
          SubnetId: subnet-USWEST2CPUBLIC
          Groups: [sg-EXISTINGSECURITYGROUP]

head_node_type: ray.head.default

However, this failed with the following error message:

> Checking AWS environment settings
> No usable subnets found, try manually creating an instance in your specified region to populate the list of subnets and trying this again.
> Note that the subnet must map public IPs on instance launch unless you set `use_internal_ips: true` in the `provider` config.

I can’t see why this subnet won’t work; I tried switching the security group to the one auto generated by Ray and it still gave the same error. I’m not sure how else to set the security group; there wasn’t anything else listed in the Ray schema or the AWS boto3 doc.

Am I misunderstanding AWS or Ray? Am I going about this the wrong way?

Ameer_Haj_Ali · February 27, 2022, 2:15pm

cc @Alex_Wu. Can you please answer ^?

mattalive · February 28, 2022, 10:31pm

I think I figured out the workaround…I had to manually pre-create a security group in the VPC, and specify a specific SubnetId and the created security group (in the Groups list) under the NetworkInterfaces entry under node_config; this was enough to get Ray to place the node properly. Perhaps the docs can be cleaned up to make this more explicit.

bryany · April 1, 2023, 4:37am

I came to the same problem, and my way of achieving it is by specifying subnet ids in available_node_types section. Looks like the way ray up works it that it retrieve the VPC according to the subnet ids:

available_node_types:
    ray.head.default:
        resources: {}
        node_config:
            KeyName: xxx
            InstanceType: xxx
            SubnetIds:
                - subnet-xxx
                - subnet-xxx
                - subnet-xxx
            ImageId: ami-xxx
    ray.worker.default:
        min_workers: 1
        max_workers: 8
        node_config:
            KeyName: xxx
            InstanceType: xxx
            SubnetIds:
                - subnet-xxx
                - subnet-xxx
                - subnet-xxx
            ImageId: ami-xxx

Topic		Replies	Views
Public IP was not assigned when utilizing own subnet on Public Cloud Ray Clusters	3	867	April 28, 2022
Starting up ray cluster on AWS EC2 instance Ray Clusters	4	1175	April 2, 2024
Running ray cluster on AWS private subnet Ray Clusters	2	672	April 28, 2022
Problem creating a cluster on AWS EC2, with a custom IAM role on the workers ta giving me an error autoscaling failed to start nodes of type ray.worker.default. (Unauthorized operation) Ray Clusters	0	533	April 18, 2023
How to Use an Existing Public IP and Subnet for Ray Cluster on Azure Ray Clusters	2	33	March 12, 2025

How to create AWS Cluster on Specific VPC?

Related topics