From this tutorial it looks like to train/validate a model we need to have all of the data loaded in memory beforehand.
My data depends on a number of preprocessing steps in upstream pipeline (using Ray Actors which write to shared obj store) and also is too large to fit into a cluster memory. I want to be able to configure the trainer so it waits for the chunk of training data from upstream and trains on it as soon as it arrives. Same for validation step. How do I do it?