I am using ray cluster. Example of problem: SomeClass.options(name=actor_name, num_cpus=7).remote() . I run SomeClass several times, on remote, in parallel. Because of not enough resources, some actors state on the ray dashboard is in PENDING_CREATION. If I want to kill all actors (ALIVE and PENDING_CREATION), actor = ray.get_actor(actor_name) and then ray.kill(actor). It kills everything on the dashboard, but the problem is with the Kubernetes cluster because pods are remaining still pending. What can be the problem?
Hi @lukakap, I didn’t quite understand the problem unfortunately, could you give some more details? It sounds like the actors are successfully killed. Is the fact that the kubernetes pods are pending related to the actors being killed? If you have a way to reliably reproduce this, that would be great.