I use the following line in my
initialization_commands as part of a process to log into AWS ECR when launching nodes:
account_id=$(aws sts get-caller-identity | jq -r .Account)
This works fine on the head node, but on worker nodes I get the error:
Unable to locate credentials. You can configure credentials by running "aws configure".
Why would the worker nodes be different from the head nodes in this regard?
Got it; I see the thing I need to do is set an IAM role for the worker node.
Edit: This isn’t working as planned. Specifically, I manually set the IAM role to the one created by the Ray auto-scaler, but … now I’m getting:
2022-10-12 20:25:27,215 WARN node_provider.py:457 -- create_instances: Attempt failed with An error occurred (UnauthorizedOperation) when calling the RunInstances operation: You are not authorized to perform this operation. Encoded authorization failure message: ZMXHU3uiPxv-sb3-ChtW8i1dWhrK_HzwdNQBA20rN9jD0Vk4Ul6PZ0nLKWWArEe9Srbbr1Hk4_Rd-eIHqvgKesLOolshk9563HglQklW5uPIjSl_mEvZ9vIpqrUUZ3iA3qhxWeaeuDb0XTuQwTxN68fKtQ6nTTKxSaVKQhNWy25utQij1g9AVdbFMrd4HHunFTwmc88ngzkO1MGWZ_ychggSUzekp0h-mc324dpb5WwXgHlwzcfNY3AQtK5o8SuVDU41TVdTfKa9_nZm4o6vglocE561CZoFcGKv08pQ1F5QLw5YNgiMEnOTTnPVzEsSQXxZAAvYZwxtoumqNADVnhul67XJ2KKSFBFanl0pdpT2b0hkQQRppXi1vjaE42vDxADDv4UZHXagTu6z1_TI5au-rxGAzZgedkGgdAA1juM8-WgPYgbIzA1Yr_fANGyCgjnFlDygamJ0ydT4paZeV1RGcxNV06tU48ibr66QvGYVCU3xEYBqiWFg1fDJXSlgPS9OUUqdzu0RnsePkRB4-gg8nmsW1KzQ, retrying.
This is confusing, since I am specifying the exact same IAM role created by the Ray auto-scaler. Any thoughts?
Edit 2: Ended up baking the container into the AMI image. Not an optimal solution; but good enough for now.