How to set timeout for a custom datasource reader

tarjintor · July 10, 2023, 9:26am

Hello,I am using ray.data customize a reader to read hdf5 file now.
But these h5 files are on weka, sometimes it’s stucked infinited, but it will works fine if I retry the reading.
So I wish to set the timeout for read tasks, and retry if the task last more than certain seconds
I search the forum and dig into the source codes,but I still don’t know how to achive this
I found you can set _remote_args for the read task, but ray.options don’t accept timeout,and I don’t know what to do now

Jules_Damji · July 10, 2023, 9:09pm

@tarjintor Thanks for posting. Sharing with the Ray Data group for any insights
cc; @chengsu Any idea if we can do that with a timeout in the ray.data_read_xxx(....timeout=??)

chengsu · July 14, 2023, 1:33am

As you already implemented a custom Datasource class, you can pass any arbitrary argument through the read_args - https://github.com/ray-project/ray/blob/master/python/ray/data/read_api.py#L297 . The read_args is passed through to Datasource.create_reader() - https://github.com/ray-project/ray/blob/master/python/ray/data/read_api.py#L2286C23-L2286C23 . So you can get the timeout argument and implement the logic for it in Datasource.create_reader().

tarjintor · July 14, 2023, 2:22am

Thanks to reply
I guess you suggest me to implement the timeout logic in my custom codes,but I have some problem to achive this

1.As I understand, ray Reader has a method get_read_tasks return List[ReadTask]，and these ReadTask also ray task, as we can set timeout for ray tasks, so if there is a way to just pass an arg such as timeout to ReadTask init, it’s will be a more general solution for all datasource more than my own task only
2.python h5py lib read h5 file don’t have a timeout args as I know, so I can only kill the read process and start another one,but ray task cancel method can do this

But I also found it’s hard to do it in ray, since ReadTask yield blocks rather than just return one block,if a task yield some blocks and failed in the middle, the retry logic will be very tricky to set.
Also as I understand, ray task yield is fake, you will need to finish a generator, collects all it results then the caller can iter the result.If so, the problem I considered is not a problem any more

Topic		Replies	Views
Write custom data streamer Ray Data	8	601	November 8, 2022
[Core] Timeout individual remote tasks Ray Core	4	935	July 28, 2021
Best practices around handling giant datasets with ray data (large amount of read tasks)	5	130	October 15, 2024
Using nodeid as custom resource	2	513	October 24, 2022
Set timeout in training Jobs submitted by python SDK Ray Train	0	104	August 5, 2024

How to set timeout for a custom datasource reader

Related topics