Getting reference counting assertation error when storing ObjectRefs in class variables

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

The following code results in a ReferenceCountingAssertationError:

import ray
import time

# dummy actor to own objects after test_put process is killed
@ray.remote
class global_actor():
    def wake(self):
        pass

# wake the actor
actor = global_actor.remote()
ray.get(actor.wake.remote())

@ray.remote
class test_put():
    def __init__(self):
        self.putted = ray.put(123, _owner=actor)
    def get(self):
        return self.putted
    def print(self):
        print(ray.get(self.putted))


test = test_put.remote()
t_get = ray.get(test.get.remote())
# allow the actor to exit and terminate
del test
time.sleep(1)
print(ray.get(t_get))

Interestingly, this does not happen if I construct the ObjectRef inside test_put.get() instead of the constructor. However for my purposes it is necessary for the actor to maintain a local ref to the ObjectRef and must be available after construction. Any clues as to what is happening?

Thanks in advance.

EDIT:

Upon further experimentation, the above issue has to do with the assignment of an ObjectRef to a class variable not incrementing the internal reference counter. It seems to work when assigning into a dict or list, but not an object. From the official documentation I thought that I could nest references in objects, but it seems to only work with lists or dicts. Is this the intended behaviour? Thanks.

Hey @bsun , welcome to Ray community and thanks for posting questions with a reproducible script.

As for your usecase, if you want to keep the object after your original test_put actor is deleted, how about you have you global_actor storing the object references explicitly? I am not 100% sure about the reference counting protocol here, but looking at _owner doc here, it seems to me setting it at ray.put is not enough to make this indirect ownership work.

I think this seems to be working, but not sure if there are any other requirements you have:

import ray
import time


# dummy actor to own objects after test_put process is killed
@ray.remote
class global_actor():
    def wake(self):
        pass

    def store_obj(self, obj):
        self.obj = obj

    def get_obj(self):
        return self.obj

# wake the actor
actor = global_actor.remote()
ray.get(actor.wake.remote())


@ray.remote
class test_put:
    def __init__(self, owner):
        self.putted = ray.put(123)

    def get(self):
        return self.putted

    def print(self):
        print(ray.get(self.putted))


test = test_put.remote(owner=actor)
t_get = ray.get(test.get.remote())

actor.store_obj.remote(t_get)
# allow the actor to exit and terminate
del test
time.sleep(5)

print(ray.get(actor.get_obj.remote()))

2 Likes

As for

the above issue has to do with the assignment of an ObjectRef to a class variable not incrementing the internal reference counter

cc @Stephanie_Wang whom should knows more about the reference counting protocol than I do.

+1 to @rickyyx’s answer. Hmm not sure what you mean by

assignment of an ObjectRef to a class variable

can you explain this part a bit more?

By the way, I would encourage you to avoid using ray.put(_owner) since this API is experimental and operations like ref counting may not work as expected.

Thanks for the workaround! I seem to have missed the tidbit about needing to pass the owner a reference to the object, and your solution does indeed fix that. I have grossly oversimplified my use case here so I’m not sure it will be directly applicable, but this is a great start for now.

Thanks for the info! I’ll see if there’s an alternative workaround that does not require us to use the _owner kwarg.

As for the reference counting error, my test_put class breaks with the following code (with everything else held the same):

@ray.remote
class test_put():
    def __init__(self):
        pass
    def get(self):
        self.putted = ray.put(123, _owner=actor)
        return self.putted

But even with the _owner assignment, the alternative code keeps the reference alive in the drive code:

@ray.remote
class test_put():
    def __init__(self):
        pass
    def get(self):
        # store in a local variable, instead as a class/instance variable in self.__dict__
        temp = ray.put(123, _owner=actor)
        temp2 = [ray.put(456, _owner=actor)
        temp3 = {'test': ray.put(789, _owner=actor)
        self.temp = temp
        self.temp1 = temp1
        self.temp2 = temp2
        return temp # returning either temp2 or temp3 also keeps the reference alive
        # return self.temp
        # returning self.temp/1/2 will not keep the reference alive and throw the assertion error

In either case, I do not pass the ObjectRef to the owner. However, specifically in the case where I assign the ObjectRef to a class/instance variable, then return that specific self.* variable I seem to get the reference count error.

With respect to @rickyyx solution, I’ll look to an alternative solution along those lines and will try to refrain from using _owner for now. Thanks for everyone’s help!

Hmm I see, thanks! I’m not able to reproduce the same behavior, though, and I don’t think it should matter how you pass back the variable. I’d guess that the root cause of the error is a race condition, and maybe the race happened to correlate with what you were seeing.