Thanks @kai! ![]()
s3_filesystem.get_file_info('raw512x256lab/1/100.bmp') gave the same error as ray data do.
(I could not use s3_path, since that returned: pyarrow.lib.ArrowInvalid: Expected an S3 object path of the form 'bucket/key...', got a URI: 's3://raw512x256lab'
But, your suggested circumvention dit the trick! Awesome! ![]()
Fully functional solution:
import ray
import json
import s3fs
from pyarrow.fs import PyFileSystem, FSSpecHandler
if __name__ == '__main__':
ray.init()
config = json.load(open('config.json'))
bucket_name = config['bucket_name']
s3_path = f"s3://{bucket_name}"
fs = s3fs.S3FileSystem(
key=config['access_key'],
secret=config['secret_key'],
client_kwargs={
'endpoint_url': f"{config['scheme']}://{config['endpoint']}"
}
)
pa_fs = PyFileSystem(FSSpecHandler(fs))
ds = ray.data.read_images(
s3_path,
filesystem=pa_fs,
include_paths=True,
)
print(ds)
print(ds.take(1)[0]["path"])