RayTune Downloading Data from S3

EthanMarx · February 12, 2024, 2:40pm

High: It blocks me to complete my task.

Hello,

I am launching a multi replica ray cluster with kubernetes to support a RayTune job. Specifically, I am using TorchTrainer to wrap a lightning module.

Currently, my lightning module downloads my training data from an S3 bucket to a directory named based off the worker_id of the ray worker. However, this means that I am using up extra memory downloading multiple instances of my dataset for each ray worker on the node.

I am wondering what the best practices are for this scenario? Is there some sort of “pre tune” hook I can run on each kubernetes replica that will download the dataset to one directory all my trials can access?

appreciate the help!

Topic		Replies	Views
Ray Tune Sync Threshold Bottleneck Ray Tune	2	69	December 25, 2024
Question about Ray Cluster/ Ray on prem Ray Clusters	6	746	June 15, 2021
Cannot find checkpoint when gpus_per_trial > 0 Ray Tune	8	629	February 28, 2023
How to stream data directly from s3 Ray Train	2	438	March 4, 2024
Ray exec multiple scripts w/ tune.run() to same ray cluster Ray Tune	18	1462	February 14, 2021

RayTune Downloading Data from S3

Related topics