Google cloud storage access from worker

Hi experts

How can I make my worker (in GCP) access files in google cloud storage?

My function with @remote decorator reads csv file from the google cloud storage. For this, in setup commands in the cluster yaml, I added ‘pip install gcsfs’.

What I observed is that

  1. the function executed in the head node can read the csv without error.
  2. the function executed in the worker node CANNOT read the csv file.
    • I tried to access the file on gcs (gs://mybucket/myexample.csv), but it returns error like below:
      “ServiceException: 401 Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.”
    • When compared worker and head VM instance, I found that the head has the following which is missing in the worker.
      Cloud API access scopes
      Allow full access to all Cloud APIs
    • I guess the problem will be resolved if my worker is created with full access to all Cloud API’s, and would like to ask how it can be done.

Thanks
Minho

Hi!

Could you provide the cluster yaml used to launch the cluster?
Also, which Ray version are you using locally?

Thank you Dmitri for the reply.

I’m using ray v2.0.0.dev0 on gcp instance. The following is the yaml that I used for my experiments.

 provider:
    type: gcp
    region: us-central1
    project_id: my_project_unique # Globally unique project id
    availability_zone: us-central1-a

cluster_name: simulate

min_workers: 0
max_workers: 1

available_node_types:
    ray_head_default:
        min_workers: 0
        max_workers: 1
        resources: {"CPU": 4}
        node_config:
            machineType: c2-standard-4
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
    ray_worker_small:
        min_workers: 0
        max_workers: 1
        resources: {"CPU": 4}
        node_config:
            machineType: c2-standard-4
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
auth:
    ssh_user: ubuntu

head_node_type: ray_head_default

head_node: {}
worker_nodes: {}

file_mounts: {
        "/home/ubuntu/csv": "/home/myaccount/data/csv", 
    }
setup_commands:
  - pip install --upgrade pip
  - pip install pandas
  - pip install gcsfs
  - pip install --upgrade google-cloud-storage

It seems you are missing the Service Account section in your worker config (inside node_config)

      serviceAccounts:
        - email: my-gcs-sa@my-project-id.iam.gserviceaccount.com
          scopes:
            - https://www.googleapis.com/auth/cloud-platform

Once that’s done make sure you give role access Storage Object Viewer to your service account inside permissions of your storage.

1 Like

Thank you for the help. After setting the serviceAccounts, I was able to access gcs bucket.

Best regard
Minho

1 Like

Hi @philippe-boyd-maxa can we just not pass service account.json like

os.environ[“GOOGLE_APPLICATION_CREDENTIALS”] = ‘/etc/config/key.json’

because i don’t think my organisation will give this service account to create i am roles.

Please suggest any alternatives

@prakhar_agrawal if you have your service account’s private key (json file) and that service account has access to GCS, then yes it should work; but setting environment variable in python as runtime is never a really good idea… At least set it before running your software so that it can be used by other services if required.

Again, if your service account does not have GCS access, your proposal won’t work. Hence giving the appropriate permissions to the service account in GCS.

Hi @philippe-boyd-maxa yes my service account’s private key(json file ) has access to the gcs. That’s why i am worried why is that not working because in our current kubernetes implementation we mount it as config map and pick it from there. Not sure why its not working here. But when i try this way serviceAccounts:
- email: my-gcs-sa@my-project-id.iam.gserviceaccount.com
scopes:
- https://www.googleapis.com/auth/cloud-platform
it actually tries to create another i-am-role which i am pretty sure my service account does not have access to

Hey is this still up to date?
I see in this PR serviceAccounts field is deprecated? [Feature] Add service account section in helm chart by ducviet00 · Pull Request #969 · ray-project/kuberay · GitHub
Also the new serviceAccountName is a Kubernetes service account and here it’s a service account in google cloud.
Is there an example for this using helm chart? I see in the helm template there is no such field, serviceAccounts.