Actors have zero resource requests by default (this may be changed in the future since it’s confusing). If you create actors with num_cpus=1, does this still happen?
try:
for i in range(n_actors):
print(i)
h = FailingActor.options(
placement_group = pg,
num_cpus = n_cores,
memory = mem_unit,
placement_group_bundle_index = i,
).remote(
name = str(i),
size = 10000,
fail = False, # True
)
print(h)
try:
ray.get(h.ready.remote())
except ray.exceptions.RayActorError as e:
print(e)
break
print("finished allocating actors")
Basically, the ref to h goes out of scope for each iteration, and this kills an actor (actor handles are also ref counted and GC’ed if there’s no more handle). So you always have only 1 concurrently alive actor at a time.
If you change your block this way;
n_actors = 3
print("Total actors: ", n_actors)
hs = []
try:
for i in range(n_actors):
print(i)
h = FailingActor.options(
placement_group = pg,
num_cpus = n_cores,
memory = mem_unit,
placement_group_bundle_index = i,
).remote(
name = str(i),
size = 10000,
fail = False, # True
)
hs.append(h)
print(h)
try:
ray.get(h.ready.remote())
except ray.exceptions.RayActorError as e:
print(e)
break
print("finished allocating actors")
except ValueError as e:
print("cannot allocate", i)
except Exception as e:
print(e)
The placement_group_bundle_index = i param is still needed to avoid the problem with an infeasible task due to memory requests – and as you mentioned we’ll wait fo the InfeasibleTaskException impl in a later release