Error when training RL policy using big offline dataset

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi experts.

I’m using Ray tune to evaluate different hyper-parameters when training an RL policy using offline data. This worked fine using a small offline data set, but the following error occurs when using a bigger offline data set:

ValueError: The actor ImplicitFunc is too large (99 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.

Location of the offline data set is specified via the offline_data() method as shown below:

create config

config = (
MARWILConfig()
.training(…)
.environment(…)
.framework(…)
.offline_data(input_=self.offline_data_train_dir)
).to_dict()

additional config updates

create trainer

trainer = MARWIL(config=config)

Is there another way to specify the offline data set when training an RL policy?

Thanks,
Stefan

Hi @steff ,

You dataset is normally not serialized upon scheduling the training.
Is your dataset public? Can we view it? self.offline_data_train_dir is a simple path, correct?
Can you post the full stack trace so that we can see where check_oversized_function() is called?

“Your dataset is normally not serialized upon scheduling the training”. What’s the normal way of scheduling training of an RL policy when using offline data? I’m using the approach described in the RLLib documentation: Working With Offline Data — Ray 2.0.0.

Yes self.offline_data_train_dir is a directory on my local machine.

This dataset is not public.

Here is the stack trace:

2022-09-23 09:35:18,936 INFO worker.py:1509 – Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2022-09-23 09:35:20,079 INFO registry.py:96 – Detected unknown callable for trainable. Converting to class.
2022-09-23 09:35:20,079 WARNING function_trainable.py:619 – Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be func(config, checkpoint_dir=None).
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/search/optuna/optuna_search.py:673: FutureWarning: LogUniformDistribution has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See Release v3.0.0 · optuna/optuna · GitHub. Use :class:~optuna.distributions.FloatDistribution instead.
return ot.distributions.LogUniformDistribution(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/search/optuna/optuna_search.py:682: FutureWarning: UniformDistribution has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See Release v3.0.0 · optuna/optuna · GitHub. Use :class:~optuna.distributions.FloatDistribution instead.
return ot.distributions.UniformDistribution(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [8] which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [16] which is of type list.
warnings.warn(message)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:502: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains [32] which is of type list.
warnings.warn(message)
[I 2022-09-23 09:35:29,901] A new study created in memory with name: optuna
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: LogUniformDistribution(high=0.0001, low=1e-07) is deprecated and internally converted to FloatDistribution(high=0.0001, log=True, low=1e-07, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=0.01, low=0.0) is deprecated and internally converted to FloatDistribution(high=0.01, log=False, low=0.0, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=1.0, low=0.95) is deprecated and internally converted to FloatDistribution(high=1.0, log=False, low=0.95, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=1e-07, low=1e-08) is deprecated and internally converted to FloatDistribution(high=1e-07, log=False, low=1e-08, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/optuna/distributions.py:766: FutureWarning: UniformDistribution(high=100.0, low=95.0) is deprecated and internally converted to FloatDistribution(high=100.0, log=False, low=95.0, step=None). See Cleanup distributions: `FloatDistribution` and `IntDistribution` · Issue #2941 · optuna/optuna · GitHub.
warnings.warn(message, FutureWarning)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/util/placement_group.py:78: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return bundle_reservation_check.options(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/ray_option_utils.py:266: DeprecationWarning: Setting ‘object_store_memory’ for actors is deprecated since it doesn’t actually reserve the required object store memory. Use object spilling that’s enabled by default (Object Spilling — Ray 2.0.0) instead to bypass the object store memory size limitation.
warnings.warn(
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group_bundle_index parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py:637: DeprecationWarning: placement_group_capture_child_tasks parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
2022-09-23 09:36:00,351 ERROR ray_trial_executor.py:562 – Trial objective_fc5a0c58: Unexpected error starting runner.
Traceback (most recent call last):
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 555, in start_trial
return self._start_trial(trial)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 458, in _start_trial
runner = self._setup_remote_runner(trial)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py”, line 399, in _setup_remote_runner
return full_actor_class.remote(**kwargs)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py”, line 637, in remote
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py”, line 387, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/actor.py”, line 844, in _remote
worker.function_actor_manager.export_actor_class(
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 479, in export_actor_class
check_oversized_function(
File “/home/stefan/anaconda3/envs/py38_ray2/lib/python3.8/site-packages/ray/_private/utils.py”, line 729, in check_oversized_function
raise ValueError(error)
ValueError: The actor ImplicitFunc is too large (166 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.

Thanks,
Stefan

Stefan, do you think you could create a reproducible example that we can run with some mock/dummy data? Happy to help but it’s a bit hard for us to know where to start.