SSL peer certificate or SSH remote key was not OK

Hi!
I am trying to read a dataset of images from an on-prem S3 solution with SSL, using my corporations internaly issued ca-certificate. I have a raycluster running in Kubernetes and have extended rayproject/ray to inlude these ca-certificates. I have alse added environment variable REQUESTS_CA_BUNDLE, ant that made boto3 work. But when using ray.data.read_images with a pyarrow.fs.S3FileSystem I have no luck. If I enter a pod and look for certificate paths I get:

>>> import certifi
>>> certifi.where()
'/home/ray/anaconda3/lib/python3.8/site-packages/certifi/cacert.pem'
>>> import ssl
>>> ssl.get_default_verify_paths()
DefaultVerifyPaths(cafile='/etc/ssl/certs/ca-certificates.crt', capath=None, openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/home/ray/anaconda3/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/home/ray/anaconda3/ssl/certs')

This is my Dockerfile:

FROM rayproject/ray:2.4.0-py38
COPY CorpCaChain.pem /usr/local/share/ca-certificates/CorpCaChain.crt
USER root
RUN update-ca-certificates
USER ray
ENV REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

With boto3 there is no problem:

import boto3
import ray
import json

if __name__ == '__main__':
    ray.init()

    config = json.load(open('config.json'))
    new_bucket_name = config['bucket_name']

    b3_session = boto3.Session(
        aws_access_key_id=config['access_key'],
        aws_secret_access_key=config['secret_key']
    )
    s3_resource = b3_session.resource(
        service_name='s3',
        use_ssl=(config["scheme"] == 'https'),
        endpoint_url=f"{config['scheme']}://{config['endpoint']}",
    )
    my_bucket = s3_resource.Bucket(new_bucket_name)

    for s3_file in my_bucket.objects.all():
        print(s3_file.key)

But not with this:

import ray
import json
from pyarrow.fs import S3FileSystem

if __name__ == '__main__':
    ray.init()

    config = json.load(open('config.json'))
    bucket_name = config['bucket_name']
    s3_path = f"s3://{bucket_name}"

    s3_filesystem = S3FileSystem( 
        access_key=config['access_key'],
        secret_key=config['secret_key'],
        endpoint_override=f"{config['scheme']}://{config['endpoint']}",
        scheme=config['scheme']
    )

    ds = ray.data.read_images(
        s3_path,
        filesystem=s3_filesystem,
        include_paths=True,
    )

The error:

2023-06-19 08:08:53,237 INFO worker.py:1616 -- Connected to Ray cluster. View the dashboard at http://10.244.1.23:8265 
Traceback (most recent call last):
  File "ray_air_s3test.py", line 19, in <module>
    ds = ray.data.read_images(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 663, in read_images
    return read_datasource(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 334, in read_datasource
    requested_parallelism, min_safe_parallelism, read_tasks = ray.get(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/worker.py", line 2521, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): ray::_get_read_tasks() (pid=3853, ip=10.244.1.73)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/read_api.py", line 1873, in _get_read_tasks
    reader = ds.create_reader(**kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/image_datasource.py", line 65, in create_reader
    return _ImageDatasourceReader(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/image_datasource.py", line 144, in __init__
    super().__init__(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_based_datasource.py", line 391, in __init__
    zip(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 175, in expand_paths
    yield from _expand_paths(paths, filesystem, partitioning, ignore_missing_paths)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 408, in _expand_paths
    yield from _get_file_infos_serial(paths, filesystem, ignore_missing_paths)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 435, in _get_file_infos_serial
    yield from _get_file_infos(path, filesystem, ignore_missing_paths)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 498, in _get_file_infos
    _handle_read_os_error(e, path)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 378, in _handle_read_os_error
    raise error
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 496, in _get_file_infos
    file_info = filesystem.get_file_info(path)
  File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: When getting information for bucket 'raw512x256lab': AWS Error NETWORK_CONNECTION during HeadBucket operation: curlCode: 60, SSL peer certificate or SSH remote key was not OK

---------------------------------------
Job 'raysubmit_BwN6R6qfHufGCSXr' failed
---------------------------------------

Status message: Job failed due to an application error, last available logs (truncated to 20,000 chars):
    _handle_read_os_error(e, path)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 378, in _handle_read_os_error
    raise error
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/file_meta_provider.py", line 496, in _get_file_infos
    file_info = filesystem.get_file_info(path)
  File "pyarrow/_fs.pyx", line 571, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: When getting information for bucket 'raw512x256lab': AWS Error NETWORK_CONNECTION during HeadBucket operation: curlCode: 60, SSL peer certificate or SSH remote key was not OK

Any advice would be appreciated!

@rentom from the error logs it looks like the problem may be in pyarrow. Can you try s3_filesystem.get_file_info(s3_path) to see if this problem comes up with just pyarrow (without ray data)?

You may be able to circumvent this by using fsspec/s3fs (pip install s3fs) and a pyarrow.fs.PyFileSystem:

from pyarrow.fs import PyFileSystem, FSSpecHandler
pa_fs = PyFileSystem(FSSpecHandler(fs)) 

(fs would be your s3fs.S3FileSystem instance).

Since s3fs uses boto under the hood, it should work out of the box.

Thanks @kai! :slight_smile:
s3_filesystem.get_file_info('raw512x256lab/1/100.bmp') gave the same error as ray data do.
(I could not use s3_path, since that returned: pyarrow.lib.ArrowInvalid: Expected an S3 object path of the form 'bucket/key...', got a URI: 's3://raw512x256lab'

But, your suggested circumvention dit the trick! Awesome! :grin:

Fully functional solution:

import ray
import json
import s3fs
from pyarrow.fs import PyFileSystem, FSSpecHandler

if __name__ == '__main__':
    ray.init()

    config = json.load(open('config.json'))
    bucket_name = config['bucket_name']
    s3_path = f"s3://{bucket_name}"

    fs = s3fs.S3FileSystem(
        key=config['access_key'],
        secret=config['secret_key'],
        client_kwargs={
            'endpoint_url': f"{config['scheme']}://{config['endpoint']}"
        }
    )

    pa_fs = PyFileSystem(FSSpecHandler(fs))

    ds = ray.data.read_images(
        s3_path,
        filesystem=pa_fs,
        include_paths=True,
    )

    print(ds)
    print(ds.take(1)[0]["path"])
1 Like

Great to hear that!

Just for future reference, this suggests that the error here is in pyarrow and should be resolved there, too (or may have already been resolved in more recent versions).

Anyway, it’s great the workaround works for you!

Does using CURL_CA_BUNDLE instead of REQUESTS_CA_BUNDLE allow your code to work without resorting to s3fs?

From what i can remeber I did try that, but with no luck.