[Core] Timeout individual remote tasks

Consider submitting a collection of work to a remote method, where most tasks will complete in under a minute, but some may take orders of magnitude longer. The desire is to kill the outliers.

Assume that the number of tasks far exceeds the number of ray workers. This could be implemented by a user if, given an incomplete task_id, there was a way to know if its status was queued vs assigned to a worker. Is there a way to do that?

Thanks, Eddie

Hey @eddie , great question and sorry for the delay, which was caused by the question being “uncategorized”. It helps if you set a category (e.g. “RLlib”) when you post a new question. That way, we’ll find it more easily and can assign the right person to answer it.

Hey @Clark_Zinzow , could someone from the Ray core team answer this one here? Thanks :slight_smile:

Hi @eddie, unfortunately, there currently isn’t a way to set a per-task timeout or an API to determine if a task is queued or running. There’s an issue that was opened recently that contains a few different options, such as using the ray.wait API or using an actor to register, watch, and cancel tasks.

Please let us know if any of those patterns will work for you!