RayTune Downloading Data from S3

  • High: It blocks me to complete my task.

Hello,

I am launching a multi replica ray cluster with kubernetes to support a RayTune job. Specifically, I am using TorchTrainer to wrap a lightning module.

Currently, my lightning module downloads my training data from an S3 bucket to a directory named based off the worker_id of the ray worker. However, this means that I am using up extra memory downloading multiple instances of my dataset for each ray worker on the node.

I am wondering what the best practices are for this scenario? Is there some sort of “pre tune” hook I can run on each kubernetes replica that will download the dataset to one directory all my trials can access?

appreciate the help!