How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I would like to replace my ray.data.read_images
and Dataset with workflows
I have been unable to find a good example showing how to read in multiple files as part of a DAG. For example, I would like to read a list of images:
from pydantic import BaseModel
from ray.dag.dag_node import DAGNode
from ray.dag.input_node import InputNode
from ray import workflow
@ray.remote(num_cpus=0.5)
def read(dir) -> Dict[str, Any]:
img = Image.open(dir)
img = np.array(img) / 255.0
return dict(image=img, path=dir)
class ContentInput(BaseModel):
paths: List[str] = []
output_dir: str = None
with InputNode() as input_node:
dag: DAGNode = [read.remote(path) for path in input_node.paths]
results = dag.execute(image_paths)
results = ray.get(results)
I could make the reader an actor and read them in a loop but this blocks until all of the images in the list are read in so not that useful.
....
@ray.remote(num_cpus=0.5)
class Reader:
def read(self, dirs: List[str]) -> List[Dict[str, Any]]:
return [self(dir) for dir in dirs]
def __call__(self, dir) -> Dict[str, Any]:
img = Image.open(dir)
img = np.array(img) / 255.0
fn = Path(dir).stem
return dict(image=img, path=dir)
Any insight on the best way to do this would be appreciated. I did find this demo but a lot of this seems to be no longer supported: 2022_04_13_ray_serve_meetup_demo/deployment_graph.py at main · ray-project/2022_04_13_ray_serve_meetup_demo · GitHub