Reading a list of images in a Worfklows

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I would like to replace my ray.data.read_images and Dataset with workflows I have been unable to find a good example showing how to read in multiple files as part of a DAG. For example, I would like to read a list of images:

from pydantic import BaseModel
from ray.dag.dag_node import DAGNode
from ray.dag.input_node import InputNode
from ray import workflow


@ray.remote(num_cpus=0.5)
def read(dir) -> Dict[str, Any]:
    img = Image.open(dir)
    img = np.array(img) / 255.0
    return dict(image=img, path=dir)


class ContentInput(BaseModel):
    paths: List[str] = []
    output_dir: str = None


with InputNode() as input_node:
    dag: DAGNode = [read.remote(path) for path in input_node.paths]

results = dag.execute(image_paths)
results = ray.get(results)

I could make the reader an actor and read them in a loop but this blocks until all of the images in the list are read in so not that useful.

....

@ray.remote(num_cpus=0.5)
class Reader:
    def read(self, dirs: List[str]) -> List[Dict[str, Any]]:
        return [self(dir) for dir in dirs]
    
    def __call__(self, dir) -> Dict[str, Any]:
        img = Image.open(dir)
        img = np.array(img) / 255.0
        fn = Path(dir).stem
        return dict(image=img, path=dir)

Any insight on the best way to do this would be appreciated. I did find this demo but a lot of this seems to be no longer supported: 2022_04_13_ray_serve_meetup_demo/deployment_graph.py at main · ray-project/2022_04_13_ray_serve_meetup_demo · GitHub

Is Ray Workflows deprecated as the slack channel seems to suggest?