Ray-tune generates error "The actor ImplicitFunc is too large”

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to use Ray-tune with Pytorch-lightning to find a good set of hyperparameters for my experiments.

I successfully ran the example code from the documentation (Using PyTorch Lightning with Tune — Ray 2.3.0).

I have adapted the example code to my project in the following ways:

  • I pass a train and val loader to the model using tune.with_parameters (in the example code, the dataloaders are initialized inside the model)
  • My config has a hierarchy, i.e., there is a dictionary inside the config dictionary

When I run the tuner for the first time, I get the error The actor ImplicitFunc is too large. The full error message is:

ValueError: The actor ImplicitFunc is too large (146 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.

2023-03-08 14:50:38,540	WARNING util.py:244 -- The `start_trial` operation took 3.378 s, which may be a performance bottleneck.

In the end, it throws an error KeyError: 'pop from an empty set'.

Any help would be appreciated.

Thanks for your question.

The error message makes sense, I suspect your dataset loader functions may be capturing a lot of local variables, probably the entire dataset actually.

When we serialize these local functions, Ray has to capture the entire local environment, otherwise they won’t run if gets shipped to remote host.
See if this matches the symptom of your workload.

In case you need us to take a look at your actual code, can you please create an issue for us at Sign in to GitHub · GitHub ? with a reproducible example.
Thanks.

Thanks for the reply! I tried two different things, one of which worked, but the other didn’t. Details are below:

  • I included the dataloaders inside the model class (Pytorch Lightning) and used a simple model (the same as in the example code). It worked without generating the error. Previously, when I passed the dataloaders with tune.with_parameters, it was throwing me the error. So, there’s some progress.
  • I ran my actual model (which is based on Transformers) instead of the simple model from above. I also included the dataloaders inside the model class (Pytorch Lightning). This generated the same error The actor ImplicitFunc is too large.

I would be happy to provide a reproducible code but it would be very long, as I would also have to include my full pytorch model. My understanding is that the error shouldn’t depend on the model. So I’m not sure what I should do, other than choosing a simple/smaller model, which would defeat the purpose.

Would it be helpful to open an issue with only the ray-tune relevant code blocks and not the model?

Some updates.

I have tried to adapt the example code in different ways and I strongly believe that the error is due to the model size. Everything works as long as the ML model is a simple one like in the example.

Could someone suggest a way to use more complicated models with Ray-tune and Pytorch-lightning. The documentation seems to suggest using tune.with_parameters to pass large data objects. But the object in this case is the model itself. It also needs to be passed with the tunable hyperparameters (config dict in this case). What’s the best way to do this?

Appreciate all the efforts so far.
Is there any chance we can take a look at your actual code?

From your description, it does look like the problem is we are trying to serialize and ship the entire model for every Tune Trial, which shouldn’t happen.
Like if you look at the example you mentioned above:
https://docs.ray.io/en/latest/tune/examples/tune-pytorch-lightning.html#putting-it-together
In the "Putting it together" section, we use tune.with_parameter() call to wrap the function train_mnist_tune(), which gets shipped to remote hosts for execution. Notice that train_mnist_tune() never gets instantiated on the driver, therefore, the actually model is not created until the Trial starts on all the remote hosts.

I think we will be able to help you much better if you can open an issue at Sign in to GitHub · GitHub with your actual Tune code.
We can even start with some code that doesn’t run (dummy train function).
Chances are we can eyeball some problems.

I also had the same problem, when faced with large data

Hey @fuji2021 @JianFeng_Liu ,

Thanks for posting this issue, although it has been a long time. In Ray 2.4, we introduced LightningTrainer, which introduced native support for PyTorch Lightning with Ray.

I noticed that you mentioned:

But the object in this case is the model itself.

When using Ray Tune with LightningTrainer, you don’t have to instantiate a model object in the driver node, then pass the model object through the tune.with_parameters.

Instead, we prefer to provide your model class and initialization arguments in LightningConfigBuilder, as a parameter of LightningTrainer. The trainer itself will create a model instance on the worker node. e.g.

lightning_config = (
  LightningConfigBuilder()
  .module(cls=MNISTClassifier, config=config)
  ...
  .build()
)

tuner = tune.Tuner(
  lightning_trainer,
  param_space={"lightning_config": lightning_config},
  ...
)

Note that MNISTClassifier here is the class type of your model, not the instance. For more information about LightningConfigBuilder, you can refer to:

I am facing a similar issue, but I am not using Pytorch. I am not sure how to fix the problem. I used ray.put() inside of the class that I want to create ray actor of, and I saved the dataset in ray.put() it didn’t solve the problem.

@newbieray can you create a new post with more information (e.g. what your script looks like?)