How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
The following code results in a ReferenceCountingAssertationError:
import ray
import time
# dummy actor to own objects after test_put process is killed
@ray.remote
class global_actor():
def wake(self):
pass
# wake the actor
actor = global_actor.remote()
ray.get(actor.wake.remote())
@ray.remote
class test_put():
def __init__(self):
self.putted = ray.put(123, _owner=actor)
def get(self):
return self.putted
def print(self):
print(ray.get(self.putted))
test = test_put.remote()
t_get = ray.get(test.get.remote())
# allow the actor to exit and terminate
del test
time.sleep(1)
print(ray.get(t_get))
Interestingly, this does not happen if I construct the ObjectRef inside test_put.get() instead of the constructor. However for my purposes it is necessary for the actor to maintain a local ref to the ObjectRef and must be available after construction. Any clues as to what is happening?
Thanks in advance.
EDIT:
Upon further experimentation, the above issue has to do with the assignment of an ObjectRef to a class variable not incrementing the internal reference counter. It seems to work when assigning into a dict or list, but not an object. From the official documentation I thought that I could nest references in objects, but it seems to only work with lists or dicts. Is this the intended behaviour? Thanks.
Hey @bsun , welcome to Ray community and thanks for posting questions with a reproducible script.
As for your usecase, if you want to keep the object after your original test_put actor is deleted, how about you have you global_actor storing the object references explicitly? I am not 100% sure about the reference counting protocol here, but looking at _owner doc here, it seems to me setting it at ray.put is not enough to make this indirect ownership work.
I think this seems to be working, but not sure if there are any other requirements you have:
import ray
import time
# dummy actor to own objects after test_put process is killed
@ray.remote
class global_actor():
def wake(self):
pass
def store_obj(self, obj):
self.obj = obj
def get_obj(self):
return self.obj
# wake the actor
actor = global_actor.remote()
ray.get(actor.wake.remote())
@ray.remote
class test_put:
def __init__(self, owner):
self.putted = ray.put(123)
def get(self):
return self.putted
def print(self):
print(ray.get(self.putted))
test = test_put.remote(owner=actor)
t_get = ray.get(test.get.remote())
actor.store_obj.remote(t_get)
# allow the actor to exit and terminate
del test
time.sleep(5)
print(ray.get(actor.get_obj.remote()))
+1 to @rickyyx’s answer. Hmm not sure what you mean by
assignment of an ObjectRef to a class variable
can you explain this part a bit more?
By the way, I would encourage you to avoid using ray.put(_owner) since this API is experimental and operations like ref counting may not work as expected.
Thanks for the workaround! I seem to have missed the tidbit about needing to pass the owner a reference to the object, and your solution does indeed fix that. I have grossly oversimplified my use case here so I’m not sure it will be directly applicable, but this is a great start for now.
But even with the _owner assignment, the alternative code keeps the reference alive in the drive code:
@ray.remote
class test_put():
def __init__(self):
pass
def get(self):
# store in a local variable, instead as a class/instance variable in self.__dict__
temp = ray.put(123, _owner=actor)
temp2 = [ray.put(456, _owner=actor)
temp3 = {'test': ray.put(789, _owner=actor)
self.temp = temp
self.temp1 = temp1
self.temp2 = temp2
return temp # returning either temp2 or temp3 also keeps the reference alive
# return self.temp
# returning self.temp/1/2 will not keep the reference alive and throw the assertion error
In either case, I do not pass the ObjectRef to the owner. However, specifically in the case where I assign the ObjectRef to a class/instance variable, then return that specific self.* variable I seem to get the reference count error.
With respect to @rickyyx solution, I’ll look to an alternative solution along those lines and will try to refrain from using _owner for now. Thanks for everyone’s help!
Hmm I see, thanks! I’m not able to reproduce the same behavior, though, and I don’t think it should matter how you pass back the variable. I’d guess that the root cause of the error is a race condition, and maybe the race happened to correlate with what you were seeing.