How to set timeout for a custom datasource reader

Hello,I am using customize a reader to read hdf5 file now.
But these h5 files are on weka, sometimes it’s stucked infinited, but it will works fine if I retry the reading.
So I wish to set the timeout for read tasks, and retry if the task last more than certain seconds
I search the forum and dig into the source codes,but I still don’t know how to achive this
I found you can set _remote_args for the read task, but ray.options don’t accept timeout,and I don’t know what to do now

@tarjintor Thanks for posting. Sharing with the Ray Data group for any insights
cc; @chengsu Any idea if we can do that with a timeout in the ray.data_read_xxx(....timeout=??)

As you already implemented a custom Datasource class, you can pass any arbitrary argument through the read_args - . The read_args is passed through to Datasource.create_reader() - . So you can get the timeout argument and implement the logic for it in Datasource.create_reader().

Thanks to reply
I guess you suggest me to implement the timeout logic in my custom codes,but I have some problem to achive this

1.As I understand, ray Reader has a method get_read_tasks return List[ReadTask],and these ReadTask also ray task, as we can set timeout for ray tasks, so if there is a way to just pass an arg such as timeout to ReadTask init, it’s will be a more general solution for all datasource more than my own task only
2.python h5py lib read h5 file don’t have a timeout args as I know, so I can only kill the read process and start another one,but ray task cancel method can do this

But I also found it’s hard to do it in ray, since ReadTask yield blocks rather than just return one block,if a task yield some blocks and failed in the middle, the retry logic will be very tricky to set.
Also as I understand, ray task yield is fake, you will need to finish a generator, collects all it results then the caller can iter the result.If so, the problem I considered is not a problem any more