Error when Pickling Actor from an imported file

I am currently trying to start a few actors that, inside their init functions, create torch models for inference.

When I import this class from another file and run the code, I get an error from cloudpickle, "Cannot pickle “InterceptedNumpy” object "

file a.py

import ray
import transformers import BertModel
import numpy as np 
 
@ray.remote
class ModelPredictor:

    def __init__(self, load_path):
        self.model = BertModel.from_pretrained(load_path) 
    
    def predict(x):
        return self.model(x)

file main.py

from a import ModelPredictor

predictors = []
load_path = 'data/models/train_model'
for _ in range(3):
    predictors.append(ModelPredictor.remote(load_path)) 

the error ends with

Exception has occurred: TypeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
cannot pickle 'InterceptedNumpy' object

When I define this class in run.py, this error goes away.
version ray 1.2.0

I have a suspicion it is related to the implicit capture @sangcho describes here but I am not certain.

I am not sure.

After some fiddling, I removed the @ray.remote decorator and instead called the
class_ = ray.remote(Class_) , then intantiated with:

class_actor = class_.remote(*args)

and this worked properly. I believe this moves away from the recommended way to define an actor.

After some debugging, I found that when using the @ray.remote decorator option, ray tries to pickle the object and wraps the path.to.module.ModelPredictor class as path.to.module.model_class._<locals>.Class (something I think ray does to register and apply the actor)

When cloudpickle attempts to call _getattribute(module, name) deep in its function call stack, the module= ‘path.to.module’ , and the name ends up being 'model_class..Class instead of the correct name ModelPredictor

When I tried modifying in the debugger the name variable, i.e. _getattribute(module, 'ModelPredictor'), I get no errors.

I updated to ray version 1.4.1, but it still does the same thing.

Most of the traceback is below:

cannot pickle 'InterceptedNumpy' object
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 580, in dump
    return Pickler.dump(self, obj)
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/_private/function_manager.py", line 364, in export_actor_class
    "class": pickle.dumps(Class),
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/actor.py", line 680, in _remote
    worker.function_actor_manager.export_actor_class(
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 367, in _invocation_actor_class_remote_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/opt/conda/envs/env/lib/python3.8/site-packages/ray/actor.py", line 418, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/usr/src/app/pipelines/models/nodes.py", line 791, in make_logit_predictions
    predictors.append(ModelPredictor.remote(dataset._get_load_path()))

BTW, ray.remote is functionally the same as @ray.remote – I would say that there’s no recommendation between the two. In fact, fwiw I only use ray.remote(Class) and never the decorator :slight_smile:

Hi @rliaw , Thanks for your comment. I do like using the ray.remote(Class) as I can dynamically control options (a really nice feature) etc., which I only discovered because of this issue. That being said, something seems off about the registration of the imported class wrapped with @ray.remote, not being able to be pickled.

This is probably because of the path not being resolved properly?

When I was deeper in the debug call stack, when I tried calling the path.to.module.model_class._<locals>.Class with (), it did accurately raise an error asking for the constructor arguments, but it was still.unable to be pickled due to the reason I mentioned in my previous comment, and able to be pickled only when I used the ray.remote(Class) approach.

I am not sure what is the correct behavior or if there are any differences between how @ray.remote vs ray.remote(Class) registers and pickles the class.

I think this could be a pickle issue. I am not an pickle expert, but the biggest difference is where ray.remote is called (for @ray.remote, it is called in a file, and for others, it is called in your driver script). Is it working if you change your code in this way?

@ray.remote
class ModelPredictor:

    def __init__(self, load_path):
        import transformers import BertModel
        import numpy as np 
        self.model = BertModel.from_pretrained(load_path) 
    
    def predict(x):
        return self.model(x)

cc @suquark do you know what could be the root cause of this difference?

I am going to give this a try. I have done something of the kind before. I will check and see if this solves that problem.

1 Like