How to use s3fs filesystem to save checkpoints in ray train

I want to train a simple XGBoost model in a multi-node ray cluster and save the model checkpoints to my Ceph s3 bucket (It’s not Amazon). For doing so, I need to specify access_key_id, secret_access_key, endpoint_url, and bucket_name. according to the documentation, You can add more filesystems by installing fs-spec-compatible filesystems e.g. using pip.. I don’t know where to specify the s3fs filesystem. I mean the RunConfig and CheckpointConfig do not take an argument named as filesystem.

@milad_heidari Thanks for the question. We mention or provide a code snippet how you do that in the RunConfig in this release blog.

You specify the cloud storage in the RunConfig storage_path` key-value argument.

trainer = TransformersTrainer(
    trainer_init_per_worker=trainer_init_per_worker,
    scaling_config=ScalingConfig(num_workers=4),
    run_config=RunConfig(
        # Requirement: Use cloud storage
        # Your checkpoints will be found within "s3://your-s3-bucket/example"
        storage_path="s3://your-s3-bucket",
        name="example",
        checkpoint_config=CheckpointConfig(
            _checkpoint_keep_all_ranks=True,
            _checkpoint_upload_from_workers=True,
        ),
    )

Thanks for the response. The problem is I’m using s3 from ceph (not AWS) and I need to provide an endpoint-url as well as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, but the Ray doesn’t understand the endpoint-url in an environment variable format. How do I address this issue?