Ray cluster's worker node is pending

Hi,
I’m trying to spin a ray cluster on AWS’s EC2 using a YAML file. After using ray up config.yaml it successfully creates two EC2 instances but, when I try to submit a job or look at Ray status it only indicates one node and

Pending:
 <ip>: ray.worker.default, uninitialized

with no failures. I’ve removed security groups and path files for config file below


# An unique identifier for the head node and workers of this cluster.
cluster_name: ray-test

# The maximum number of workers nodes to launch in addition to the head
# node.
max_workers: 2

# The autoscaler will scale up the cluster faster with higher upscaling speed.
# E.g., if the task requires adding more nodes then autoscaler will gradually
# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
# This number should be > 0.
upscaling_speed: 2.0

# This executes all commands on all nodes in the docker container,
# and opens all the necessary ports to support the Ray cluster.
# Empty string means disabled.
docker:
    image: "rayproject/ray-ml:latest-gpu" # gpu You can change this to latest-cpu if you don't need GPU support and want a faster startup
    # image: rayproject/ray-ml:latest-gpu   # use this one if you don't need ML dependencies, it's faster to pull
    container_name: "ray_container"
    # If true, pulls latest version of image. Otherwise, `docker run` will only pull the image
    # if no cached version is present.
    pull_before_run: True
    run_options:   # Extra options to pass into "docker run"
        - --ulimit nofile=65536:65536

    # Example of running a GPU head with CPU workers
    # head_image: "rayproject/ray-ml:latest-gpu"
    # Allow Ray to automatically detect GPUs

    # worker_image: "rayproject/ray-ml:latest-cpu"
    # worker_run_options: []

# If a node is idle for this many minutes, it will be removed.
idle_timeout_minutes: 10

# Cloud-provider specific configuration.
provider:
   type: aws
   region: us-east-1
   availability_zone: us-east-1a,us-east-1b


auth:
    ssh_user: ubuntu
    ssh_private_key: /path/to/key/.pem

available_node_types:
    ray.head.default:
        # resources: {"CPU": 1, "GPU": 1, "custom": 5}
        #resources: { "CPU": 4, "GPU": 1}
        node_config:
            # IamInstanceProfile:
            #     Name: "ray-autoscaler-v1"
            InstanceType: p3.2xlarge #g4dn.2xlarge  p3.2xlarge
            ImageId: ami-029510cec6d69f121 #ami-029510cec6d69f121 # Deep Learning AMI (Ubuntu) Version 30
            KeyName: <key-name>
            SecurityGroupIds: [sg1, sg2, sg3] #See above for group IDS
            BlockDeviceMappings:
                - DeviceName: /dev/sda1
                  Ebs:
                      VolumeSize: 200 #100GB
    ray.worker.default:
        min_workers: 1
        max_workers: 2
        resources: {}
        node_config:
            # IamInstanceProfile:
            #     Name: "ray-autoscaler-v1"
            InstanceType:  p3.2xlarge #g4dn.2xlarge  p3.2xlarge
            ImageId: ami-029510cec6d69f121 #ami-029510cec6d69f121 # Deep Learning AMI (Ubuntu) Version 30
            KeyName: <key-name>
            #InstanceMarketOptions:
            #    MarketType: spot
            SecurityGroupIds: [<sg1>]


head_node_type: ray.head.default
file_mounts: {
#    "/path2/on/remote/machine": "/path2/on/local/machine", #/home/ray
}


cluster_synced_files: []
file_mounts_sync_continuously: False

# Patterns for files to exclude when running rsync up or rsync down
rsync_exclude:
    - "**/.git"
    - "**/.git/**"

# Pattern files to use for filtering out files when running rsync up or rsync down. The file is searched for
# in the source directory and recursively through all subdirectories. For example, if .gitignore is provided
# as a value, the behavior will match git's behavior for finding and using .gitignore files.
rsync_filter:
    - ".gitignore"

# List of commands that will be run before `setup_commands`. If docker is
# enabled, these commands will run outside the container and before docker
# is setup.
initialization_commands: []

# List of shell commands to run to set up nodes.
setup_commands: 
    - pip install -U ninja
    - pip install -U lpips
    - pip install tblib
    # Note: if you're developing Ray, you probably want to create a Docker image that
    # has your Ray repo pre-cloned. Then, you can replace the pip installs
    # below with a git checkout <your_sha> (and possibly a recompile).
    # To run the nightly version of ray (as opposed to the latest), either use a rayproject docker image
    # that has the "nightly" (e.g. "rayproject/ray-ml:nightly-gpu") or uncomment the following line:
    # - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl"

# Custom commands that will be run on the head node after common setup.
head_setup_commands: []

# Custom commands that will be run on worker nodes after common setup.
worker_setup_commands: []

# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - ray stop
    - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

head_node: {}
worker_nodes: {}```

ray.cluster_resources() returns {'memory': 37903803188.0, 'node:<ip>': 1.0, 'GPU': 1.0, 'accelerator_type:V100': 1.0, 'object_store_memory': 18951901593.0, 'CPU': 8.0}

instead it should return 2 nodes.

I figured out the problem was private key I specified. It was not accessible in worker nodes. I commented out those lines and it spinned up (ref: https://github.com/ray-project/ray/issues/18529)

But after few mins of idle it goes back to uninitialized.