Worker errors possibly causing ray-tune overhead slowdown

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello everyone!

We have a code snippet below using BasicVariantGenerator

from ray import tune
from ray.tune.search.basic_variant import BasicVariantGenerator

params = {
    "x1": tune.uniform(-5, 5),
    "x2": tune.uniform(-5, 5),
}

def objective_fn(config):
    fn = (config["x1"] - 2)**2 + (config["x2"] + 3)**2
    return {"fn": fn}

searcher = BasicVariantGenerator()
tune_config = tune.TuneConfig(
    metric="fn",
    mode="min",
    search_alg=searcher,
    num_samples=200,
)

tuner = tune.Tuner(
    objective_fn, 
    tune_config=tune_config, 
    param_space=params
)
results = tuner.fit()

which gives the worker errors:

2024-01-05 17:28:52,112 INFO worker.py:1673 – Started a local Ray instance.
2024-01-05 17:28:52,984 INFO tune.py:220 – Initializing Ray automatically. For cluster usage or custom Ray initialization, call ray.init(...) before Tuner(...).
2024-01-05 17:28:52,986 INFO tune.py:595 – [output] This will use the new output engine with verbosity 1. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see Experimental features in Ray AIR · Issue #36949 · ray-project/ray · GitHub
�[36m(bundle_reservation_check_func pid=119476)�[0m Traceback (most recent call last):
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1649, in ray._raylet.execute_task
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1651, in ray._raylet.execute_task
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/worker.py”, line 740, in deserialize_objects
�[36m(bundle_reservation_check_func pid=119476)�[0m context = self.get_serialization_context()
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/worker.py”, line 628, in get_serialization_context
�[36m(bundle_reservation_check_func pid=119476)�[0m context_map[job_id] = serialization.SerializationContext(self)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/serialization.py”, line 153, in init
�[36m(bundle_reservation_check_func pid=119476)�[0m serialization_addons.apply(self)
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/util/serialization_addons.py”, line 82, in apply
�[36m(bundle_reservation_check_func pid=119476)�[0m from ray._private.arrow_serialization import (
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/arrow_serialization.py”, line 216, in
�[36m(bundle_reservation_check_func pid=119476)�[0m @dataclass
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/dataclasses.py”, line 1230, in dataclass
�[36m(bundle_reservation_check_func pid=119476)�[0m return wrap(cls)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/dataclasses.py”, line 1220, in wrap
�[36m(bundle_reservation_check_func pid=119476)�[0m return _process_class(cls, init, repr, eq, order, unsafe_hash,
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/dataclasses.py”, line 1056, in _process_class
�[36m(bundle_reservation_check_func pid=119476)�[0m _cmp_fn(‘eq’, ‘==’,
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/dataclasses.py”, line 630, in _cmp_fn
�[36m(bundle_reservation_check_func pid=119476)�[0m return _create_fn(name,
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/dataclasses.py”, line 433, in _create_fn
�[36m(bundle_reservation_check_func pid=119476)�[0m exec(txt, globals, ns)
�[36m(bundle_reservation_check_func pid=119476)�[0m File “”, line 0, in
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/worker.py”, line 791, in sigterm_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m raise_sys_exit_with_custom_error_message(
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 725, in ray._raylet.raise_sys_exit_with_custom_error_message
�[36m(bundle_reservation_check_func pid=119476)�[0m SystemExit: 1
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m During handling of the above exception, another exception occurred:
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m Traceback (most recent call last):
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1960, in ray._raylet.execute_task_with_cancellation_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1617, in ray._raylet.execute_task
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1618, in ray._raylet.execute_task
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1621, in ray._raylet.execute_task
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/includes/libcoreworker.pxi”, line 33, in ray._raylet.ProfileEvent.exit
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/traceback.py”, line 184, in format_exc
�[36m(bundle_reservation_check_func pid=119476)�[0m return “”.join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/traceback.py”, line 139, in format_exception
�[36m(bundle_reservation_check_func pid=119476)�[0m te = TracebackException(type(value), value, tb, limit=limit, compact=True)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/traceback.py”, line 728, in init
�[36m(bundle_reservation_check_func pid=119476)�[0m self.stack = StackSummary._extract_from_extended_frame_gen(
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/traceback.py”, line 433, in _extract_from_extended_frame_gen
�[36m(bundle_reservation_check_func pid=119476)�[0m f.line
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/traceback.py”, line 318, in line
�[36m(bundle_reservation_check_func pid=119476)�[0m self._line = linecache.getline(self.filename, self.lineno)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/linecache.py”, line 30, in getline
�[36m(bundle_reservation_check_func pid=119476)�[0m lines = getlines(filename, module_globals)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/linecache.py”, line 46, in getlines
�[36m(bundle_reservation_check_func pid=119476)�[0m return updatecache(filename, module_globals)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/linecache.py”, line 93, in updatecache
�[36m(bundle_reservation_check_func pid=119476)�[0m stat = os.stat(fullname)
�[36m(bundle_reservation_check_func pid=119476)�[0m ^^^^^^^^^^^^^^^^^
�[36m(bundle_reservation_check_func pid=119476)�[0m KeyboardInterrupt
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m During handling of the above exception, another exception occurred:
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m Traceback (most recent call last):
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 2064, in ray._raylet.task_execution_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 1995, in ray._raylet.execute_task_with_cancellation_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 953, in ray._raylet.store_task_errors
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/utils.py”, line 178, in push_error_to_driver
�[36m(bundle_reservation_check_func pid=119476)�[0m worker.core_worker.push_error(job_id, error_type, message, time.time())
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 4527, in ray._raylet.CoreWorker.push_error
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 468, in ray._raylet.check_status
�[36m(bundle_reservation_check_func pid=119476)�[0m ray.exceptions.RaySystemError: System error: Broken pipe
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m During handling of the above exception, another exception occurred:
�[36m(bundle_reservation_check_func pid=119476)�[0m
�[36m(bundle_reservation_check_func pid=119476)�[0m Traceback (most recent call last):
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 2103, in ray._raylet.task_execution_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/utils.py”, line 178, in push_error_to_driver
�[36m(bundle_reservation_check_func pid=119476)�[0m worker.core_worker.push_error(job_id, error_type, message, time.time())
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 4527, in ray._raylet.CoreWorker.push_error
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 468, in ray._raylet.check_status
�[36m(bundle_reservation_check_func pid=119476)�[0m ray.exceptions.RaySystemError: System error: Broken pipe
�[36m(bundle_reservation_check_func pid=119476)�[0m Exception ignored in: ‘ray._raylet.task_execution_handler’
�[36m(bundle_reservation_check_func pid=119476)�[0m Traceback (most recent call last):
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 2103, in ray._raylet.task_execution_handler
�[36m(bundle_reservation_check_func pid=119476)�[0m File “/home/ubuntu/miniconda3/envs/raytune_test/lib/python3.11/site-packages/ray/_private/utils.py”, line 178, in push_error_to_driver
�[36m(bundle_reservation_check_func pid=119476)�[0m worker.core_worker.push_error(job_id, error_type, message, time.time())
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 4527, in ray._raylet.CoreWorker.push_error
�[36m(bundle_reservation_check_func pid=119476)�[0m File “python/ray/_raylet.pyx”, line 468, in ray._raylet.check_status
�[36m(bundle_reservation_check_func pid=119476)�[0m ray.exceptions.RaySystemError: System error: Broken pipe

But similar errors occur when using OptunaSearch. We think that these errors slows down the overhead of using optuna with ray-tune when compared to the execution time of directly using optuna (without ray-tune). I’ve created the issue here as well.

Any guidance will be greatly appreciated! :slight_smile: