A question was asked in this Slack thread:
I have a long running script that keeps pulling messages from pubsub and submits a Ray job per message to the Ray cluster. I have ray autoscaling in place but some of the jobs still fail with the error
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Is there a way to stall jobs or to check the resource usage before submitting a job?