Google cloud storage access from worker

Hi experts

How can I make my worker (in GCP) access files in google cloud storage?

My function with @remote decorator reads csv file from the google cloud storage. For this, in setup commands in the cluster yaml, I added ‘pip install gcsfs’.

What I observed is that

  1. the function executed in the head node can read the csv without error.
  2. the function executed in the worker node CANNOT read the csv file.
    • I tried to access the file on gcs (gs://mybucket/myexample.csv), but it returns error like below:
      “ServiceException: 401 Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.”
    • When compared worker and head VM instance, I found that the head has the following which is missing in the worker.
      Cloud API access scopes
      Allow full access to all Cloud APIs
    • I guess the problem will be resolved if my worker is created with full access to all Cloud API’s, and would like to ask how it can be done.

Thanks
Minho

Hi!

Could you provide the cluster yaml used to launch the cluster?
Also, which Ray version are you using locally?

Thank you Dmitri for the reply.

I’m using ray v2.0.0.dev0 on gcp instance. The following is the yaml that I used for my experiments.

 provider:
    type: gcp
    region: us-central1
    project_id: my_project_unique # Globally unique project id
    availability_zone: us-central1-a

cluster_name: simulate

min_workers: 0
max_workers: 1

available_node_types:
    ray_head_default:
        min_workers: 0
        max_workers: 1
        resources: {"CPU": 4}
        node_config:
            machineType: c2-standard-4
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
    ray_worker_small:
        min_workers: 0
        max_workers: 1
        resources: {"CPU": 4}
        node_config:
            machineType: c2-standard-4
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
auth:
    ssh_user: ubuntu

head_node_type: ray_head_default

head_node: {}
worker_nodes: {}

file_mounts: {
        "/home/ubuntu/csv": "/home/myaccount/data/csv", 
    }
setup_commands:
  - pip install --upgrade pip
  - pip install pandas
  - pip install gcsfs
  - pip install --upgrade google-cloud-storage

It seems you are missing the Service Account section in your worker config (inside node_config)

      serviceAccounts:
        - email: my-gcs-sa@my-project-id.iam.gserviceaccount.com
          scopes:
            - https://www.googleapis.com/auth/cloud-platform

Once that’s done make sure you give role access Storage Object Viewer to your service account inside permissions of your storage.

1 Like

Thank you for the help. After setting the serviceAccounts, I was able to access gcs bucket.

Best regard
Minho

1 Like