there is a multitude of libs in python for downloading data, and many ‘best practices’ exist wrt multi-threading, multi-processing, chunking, range reads, streaming, I/O vs num of vcpus etc. . While being a basic functionality but given its pervasive applicability, i think it would be very helpful in the context of ray to provide some sort of guidance (ideally via a code snippet, or maybe even some sort of Ray-provided lib?) on the fastest way to download files and process them within Ray. In this case, i would define “fastest” as the one that achieves the highest utilization of the vNIC.
WDYT? Does something like this exist and I’m maybe looking in the wrong place?