Losting a lot of files from Blob Storage based on glob

I have seen that ray datasets allows you to read binary files. However, you don’t have any way of filter them based on glob.

https://docs.ray.io/en/latest/data/api/doc/ray.data.read_binary_files.html#ray.data.read_binary_files

I was looking at this because I need to get a list of millions of images based on a glob pattern from an Azure Blob Container that has Hierarchical namespace disabled. Loading them with spark is not fast, even using autoloader. Find takes a lot of time too.

Thanks in advance!

Hey @WaterKnight,

What does your glob look like?