Error when training RL policy using big offline dataset

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi experts.

I’m using Ray tune to evaluate different hyper-parameters when training an RL policy using offline data. This worked fine using a small offline data set, but the following error occurs when using a bigger offline data set:

ValueError: The actor ImplicitFunc is too large (99 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.

Location of the offline data set is specified via the offline_data() method as shown below:

create config

config = (
MARWILConfig()
.training(…)
.environment(…)
.framework(…)
.offline_data(input_=self.offline_data_train_dir)
).to_dict()

additional config updates

create trainer

trainer = MARWIL(config=config)

Is there another way to specify the offline data set when training an RL policy?

Thanks,
Stefan

Hi @steff ,

You dataset is normally not serialized upon scheduling the training.
Is your dataset public? Can we view it? self.offline_data_train_dir is a simple path, correct?
Can you post the full stack trace so that we can see where check_oversized_function() is called?

“Your dataset is normally not serialized upon scheduling the training”. What’s the normal way of scheduling training of an RL policy when using offline data? I’m using the approach described in the RLLib documentation: Working With Offline Data — Ray 2.0.0.

Yes self.offline_data_train_dir is a directory on my local machine.

This dataset is not public.

Here is the stack trace:

2022-09-23 09:35:18,936 INFO worker.py:1509 – Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2022-09-23 09:35:20,079 INFO registry.py:96 – Detected unknown callable for trainable. Converting to class.
2022-09-23 09:35:20,079 WARNING function_trainable.py:619 – Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be func(config, checkpoint_dir=None).
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/search/optuna/optuna_search.py:673: FutureWarning: LogUniformDistribution has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See Release v3.0.0 · optuna/optuna · GitHub. Use :class:~optuna.distributions.FloatDistribution instead.
return ot.distributions.LogUniformDistribution(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/search/optuna/optuna_search.py:682: FutureWarning: UniformDistribution has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See Release v3.0.0 · optuna/optuna · GitHub. Use :class:~optuna.distributions.FloatDistribution instead.
return ot.distributions.UniformDistribution(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [8] which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [16] which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [32] which is of type list.
warnings.warn(message)
[I 2022-09-23 09:35:29,901] A new study created in memory with name: optuna
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: LogUniformDistribution(high=0.0001, low=1e-07) is deprecated and internally converted to FloatDistribution(high=0.0001, log=True, low=1e-07, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=0.01, low=0.0) is deprecated and internally converted to FloatDistribution(high=0.01, log=False, low=0.0, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=1.0, low=0.95) is deprecated and internally converted to FloatDistribution(high=1.0, log=False, low=0.95, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=1e-07, low=1e-08) is deprecated and internally converted to FloatDistribution(high=1e-07, log=False, low=1e-08, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=100.0, low=95.0) is deprecated and internally converted to FloatDistribution(high=100.0, log=False, low=95.0, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/util/placement_group.py:78: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return bundle_reservation_check.options(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/ray_option_utils.py:266: DeprecationWarning: Setting ‘object_store_memory’ for actors is deprecated since it doesn’t actually reserve the required object store memory. Use object spilling that’s enabled by default (Object Spilling — Ray 2.0.0) instead to bypass the object store memory size limitation.
warnings.warn(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group_bundle_index parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group_capture_child_tasks parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
2022-09-23 09:36:00,351 ERROR ray_trial_executor.py:562 – Trial objective_fc5a0c58: Unexpected error starting runner.
Traceback (most recent call last):
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 555, in start_trial
return self._start_trial(trial)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 458, in _start_trial
runner = self._setup_remote_runner(trial)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 399, in _setup_remote_runner
return full_actor_class.remote(**kwargs)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py”, line 637, in remote
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py”, line 387, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py”, line 844, in _remote
worker.function_actor_manager.export_actor_class(
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 479, in export_actor_class
check_oversized_function(
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/utils.py”, line 729, in check_oversized_function
raise ValueError(error)
ValueError: The actor ImplicitFunc is too large (166 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.

Thanks,
Stefan

Stefan, do you think you could create a reproducible example that we can run with some mock/dummy data? Happy to help but it’s a bit hard for us to know where to start.

This error doesn’t occur with the original environment when the number of episodes and/or number of features in the observation space is reduced, so it seems to have something to do with the offline data size.

Tried to recreate this problem using a different environment that I can share, but so far no luck even when the new environment generates offline data with many more episodes and the observation space contains many more features. The size of the original offline data that generates this error is 300MB. The size of the offline data from the new environment that doesn’t generate this error is 26GB.

The error “ValueError: The actor ImplicitFunc is too large (166 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB)” is generated in method check_oversized_function in file ray/_private/utils.py. Under what circumstances would this error be generated when using MARWIL algorithm to train a policy using offline data?

Thanks,
Stefan

Since you can’t provide a repro, the best thing I can do is provide some context and a shot in the dark:

  • ImplicitFunc is used by tune to wrap training functions (basically a training loop that returns results) with different signatures/inputs/outputs.

  • The way you create your trainer, tune will automatically create a small training loop and wrap it with ImplicitFunc

  • If you only create the trainer like you did, this should not create an actor or a remote function afaics. Is there code missing?