Losting a lot of files from Blob Storage based on glob

I have seen that ray datasets allows you to read binary files. However, you don’t have any way of filter them based on glob.


I was looking at this because I need to get a list of millions of images based on a glob pattern from an Azure Blob Container that has Hierarchical namespace disabled. Loading them with spark is not fast, even using autoloader. Find takes a lot of time too.

Thanks in advance!

Hey @WaterKnight,

What does your glob look like?