Hi, I’m new to Ray and exploring whether it can be used with some existing AWS infrastructure. The broad view of what I’m trying to do is set up a cluster that can connect to an existing EFS drive. This EFS is located on one of two VPCs in my account. This particular VPC has several public and private subnets for each of the availability zones within us-west-2, but I can’t seem to figure out how to set up the cluster on that VPC.
My first naive attempt:
cluster_name: minimal
provider:
type: aws
region: us-west-2
availability_zone: us-west-2c
cache_stopped_nodes: False
This successfully creates the cluster, assigns a public IP address, and automatically creates a new security group, but places it on the wrong VPC.
So, I tried to define the specific subnet IDs, using the public and private subnets for us-west-2c:
provider:
type: aws
region: us-west-2
availability_zone: us-west-2c
cache_stopped_nodes: False
available_node_types:
ray.head.default:
node_config:
InstanceType: m5a.large
SubnetIds: [subnet-USWEST2CPUBLIC, subnet-USWEST2CPRIVATE]
head_node_type: ray.head.default
This does create an EC2 instance on the correct VPC, in the public subnet, with an automatically generated security group, however no public IP address gets assigned and the setup times out:
Launched 1 nodes [subnet_id=subnet-USWEST2CPUBLIC]
Launched instance i-XXX [state=pending, info=pending]
Launched a new head node
Fetching the new head node
<1/1> Setting up head node
Prepared bootstrap config
New status: waiting-for-ssh
[1/7] Waiting for SSH to become available
Running `uptime` as a test.
Waiting for IP
Not yet available, retrying in 5 seconds
From here, I tried setting up the head node with a network interface with a public facing IP (following the AWS docs for instance creation):
cluster_name: minimal
provider:
type: aws
region: us-west-2
availability_zone: us-west-2c
cache_stopped_nodes: False
available_node_types:
ray.head.default:
node_config:
InstanceType: m5a.large
NetworkInterfaces:
- AssociatePublicId: True
SubnetId: subnet-USWEST2CPUBLIC
Groups: [sg-EXISTINGSECURITYGROUP]
head_node_type: ray.head.default
However, this failed with the following error message:
> Checking AWS environment settings
> No usable subnets found, try manually creating an instance in your specified region to populate the list of subnets and trying this again.
> Note that the subnet must map public IPs on instance launch unless you set `use_internal_ips: true` in the `provider` config.
I can’t see why this subnet won’t work; I tried switching the security group to the one auto generated by Ray and it still gave the same error. I’m not sure how else to set the security group; there wasn’t anything else listed in the Ray schema or the AWS boto3 doc.
Am I misunderstanding AWS or Ray? Am I going about this the wrong way?