InvalidRequest Error when writing parquet to private S3 bucket

NEO · February 8, 2023, 7:45pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, I’m currently testing Ray Data (Ray version 2.2.0) parquet functionalities for working with large datasets (>= 7GB) that I have stored in a private S3 bucket I created. For my environment, I started up an anyscale cluster and connected to it with ray.init. For accessing my private bucket, I followed the documentation on anyscale (Accessing a Private S3 Bucket | Anyscale Docs). Then, I read the csv file from S3 by doing the following:

ds = ray.data.read_csv("s3://my-private-bucket/test-dataset/test.csv").repartition(400)

Afterwards, I planned on converting this partitioned dataset to parquet files and uploading it to a specific in my S3 bucket:

ds.write_parquet("s3://my-private-bucket/test-dataset/ray-partitions/", try_create_dir=False)

However, when I try to run write_parquet, I get the following error:

(write_block pid=3229) OSError: When uploading part for key 'test-dataset/ray-partitions/ef1b17dcb0d94cf09eec81122cea91d1_000002.parquet' in bucket 'my-private-bucket': AWS Error [code 100]: Unable to parse ExceptionName: InvalidRequest Message: Content-MD5 OR x-amz-checksum- HTTP header is required for Put Part requests with Object Lock parameters

I am not entirely sure why this error occurs, the cluster is configured with the role I specified in the bucket policy. Can anyone give me an idea why this could be happening? What is the solution to resolving this?

Topic		Replies	Views
AWS InvalidRequest Message when writing parquet to private S3 bucket Ray Data	0	531	February 14, 2023
Cannot read parquet from S3	2	786	October 20, 2022
Cannot read parquet files Ray Data	2	640	April 19, 2023
Possible reasons for ray data stucks at write_csv (or write_parquet)?	3	364	July 25, 2023
[Ray Data] error with read_parquet from hdfs Ray Data	9	824	April 13, 2023

InvalidRequest Error when writing parquet to private S3 bucket

Related topics