Answered already on Slack but reposting here incase others are wondering:
If you’re running Ray on Anyscale, use the VideoDatasource
API: Video API | Anyscale Docs.
If you’re running open source Ray, implement a custom datasource. Here’s a relevant guide: Advanced: Read and Write Custom File Types — Ray 2.9.3. You can use GitHub - dmlc/decord: An efficient video loader for deep learning with smart shuffling that's super easy to digest to iteratively read video frames.