How to access Amazon S3

I posted this in Ray Data, but seems it less active than the core forum. Sorry for multiple posts.

There are various examples how Ray can read and write data from Amazon S3, for example

ds = ray.data.read_binary_files("s3://bucket/image-dir")

How to configure Ray with S3 credentials? I don’t run Ray in AWS, I run it locally on my laptop (just installed it with pip ) and I want to read data from my Amazon S3 and also write there.

Thanks

Hi @Gil_Vernik! If you set your AWS credentials via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, Datasets should use those credentials without any code changes.

If this environment variable method isn’t agreeable, you can pass ray.data.read_binary_files() an Arrow S3FileSystem instance containing your AWS credentials (see the .read_binary_files() API).

1 Like

Hi, @Clark_Zinzow how to set all environment variables when I use MinIO locally, since I don’t like to pass Arrrow S3FileSystem instance to .read_binary_files API.

full example:



fs = s3fs.S3FileSystem(
    anon=False,
    use_ssl=False,
    client_kwargs={
        "aws_access_key_id": 'key',
        "aws_secret_access_key": 'key',
        "endpoint_url": 'endpoint',
        "verify": False})

ds = ray.data.read_parquet(filesystem=fs,
                           paths="s3://....parquet",
                           )
1 Like