I have seen that ray datasets allows you to read binary files. However, you don’t have any way of filter them based on glob.
I was looking at this because I need to get a list of millions of images based on a glob pattern from an Azure Blob Container that has Hierarchical namespace disabled. Loading them with spark is not fast, even using autoloader. Find takes a lot of time too.
Thanks in advance!