I posted this in Ray Data, but seems it less active than the core forum. Sorry for multiple posts.
There are various examples how Ray can read and write data from Amazon S3, for example
ds = ray.data.read_binary_files("s3://bucket/image-dir")
How to configure Ray with S3 credentials? I don’t run Ray in AWS, I run it locally on my laptop (just installed it with pip ) and I want to read data from my Amazon S3 and also write there.
Thanks
Hi @Gil_Vernik! If you set your AWS credentials via the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables, Datasets should use those credentials without any code changes.
If this environment variable method isn’t agreeable, you can pass ray.data.read_binary_files()
an Arrow S3FileSystem
instance containing your AWS credentials (see the .read_binary_files()
API).
1 Like
Hi, @Clark_Zinzow how to set all environment variables when I use MinIO locally, since I don’t like to pass Arrrow S3FileSystem instance to .read_binary_files API.
full example:
fs = s3fs.S3FileSystem(
anon=False,
use_ssl=False,
client_kwargs={
"aws_access_key_id": 'key',
"aws_secret_access_key": 'key',
"endpoint_url": 'endpoint',
"verify": False})
ds = ray.data.read_parquet(filesystem=fs,
paths="s3://....parquet",
)
1 Like