Yea it’s expected since we don’t have enough GPU resources to schedule all 8 actors. Only the first 4 actors can be scheduled and the remaining 4 will be in the pending state.
It is fine to schedule the remaining 4 in the pending state. But those pending actors seems stuck there forever. In my understanding , they will be put into running as long as the other 4 actors finished.
From the screenshot, you can see 7 mins elaplsed before I killed that program.
What I changed are to tell ray to use gpu and to increase the actor number to 8 . But you are right, the code given in the example seems bugy. I will double check it.
I checked the example code again. It’s hard to say it is a bug . I believe the author did not think too much on it. If the actor number is larger than the resources system can offer, the program will hang forever.
I would like dive deepr . Pls help me clearify
Suppose I have 4 gpus availabe now
I submit 8 actors but only 4 actors can be created and become active while another 4 will suspend there.
2). The 4 active actors will occupy the gpus forever until no one holds the refs or be killed explictly by ray.kill()
3). Even the 4 active actors do nothing and just idle , another 4 suspended actors have no chance to be activated to run.
Simply speaking, as long as an actor lives, the resource it occupies won’t be released. Am I right ? It is designed intentionly ?
Yes, your understanding is correct. It’s designed intentionally. When you specify num_gpus=1, you are saying for the lifetime of this actor, reserve 1 gpu for it regardless the actual physical usage.