Why do I get TimeOutError while setting "_temp_dir" in ray.init()?

Hi! I am trying to set the _temp_dir parameter in the ray.init() function. But when I try to use _temp_dir, an error occurs every time. I am running my code in a Linux-based ec2 instance.

The error

TimeoutError                              Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/ray/_private/node.py:312, in Node.__init__(self, ray_params, head, shutdown_at_exit, spawn_reaper, connect_only)
    311 try:
--> 312     ray._private.services.wait_for_node(
    313         self.redis_address,
    314         self.gcs_address,
    315         self._plasma_store_socket_name,
    316         self.redis_password,
    317     )
    318 except TimeoutError:

File ~/.local/lib/python3.10/site-packages/ray/_private/services.py:385, in wait_for_node(redis_address, gcs_address, node_plasma_store_socket_name, redis_password, timeout)
    384         time.sleep(0.1)
--> 385 raise TimeoutError("Timed out while waiting for node to startup.")

TimeoutError: Timed out while waiting for node to startup.

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
Cell In [30], line 1
----> 1 ray.init(configure_logging=True,
      2         include_dashboard=True,
      3         log_to_driver=True,
      4         logging_level=logging.FATAL,
      5         _temp_dir="raylet_results/temp",
      6         _system_config={
      7             "object_spilling_config": json.dumps(
      8                 {"type": "filesystem", "params": {"directory_path": "raylet_results/object_spilling"}},
      9             )
     10         },
     11     )

File ~/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    103     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    104         return getattr(ray, func.__name__)(*args, **kwargs)
--> 105 return func(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/ray/_private/worker.py:1429, in init(address, num_cpus, num_gpus, resources, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs)
   1387     ray_params = ray._private.parameter.RayParams(
   1388         node_ip_address=node_ip_address,
   1389         raylet_ip_address=raylet_ip_address,
   (...)
   1423         node_name=_node_name,
   1424     )
   1425     # Start the Ray processes. We set shutdown_at_exit=False because we
   1426     # shutdown the node in the ray.shutdown call that happens in the atexit
   1427     # handler. We still spawn a reaper process in case the atexit handler
   1428     # isn't called.
-> 1429     _global_node = ray._private.node.Node(
   1430         head=True, shutdown_at_exit=False, spawn_reaper=True, ray_params=ray_params
   1431     )
   1432 else:
   1433     # In this case, we are connecting to an existing cluster.
   1434     if num_cpus is not None or num_gpus is not None:

File ~/.local/lib/python3.10/site-packages/ray/_private/node.py:319, in Node.__init__(self, ray_params, head, shutdown_at_exit, spawn_reaper, connect_only)
    312     ray._private.services.wait_for_node(
    313         self.redis_address,
    314         self.gcs_address,
    315         self._plasma_store_socket_name,
    316         self.redis_password,
    317     )
    318 except TimeoutError:
--> 319     raise Exception(
    320         "The current node has not been updated within 30 "
    321         "seconds, this could happen because of some of "
    322         "the Ray processes failed to startup."
    323     )
    324 node_info = ray._private.services.get_node_to_connect_for_driver(
    325     self.redis_address,
    326     self.gcs_address,
    327     self._raylet_ip_address,
    328     redis_password=self.redis_password,
    329 )
    330 if self._ray_params.node_manager_port == 0:

Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.

I am using this


ray.init(configure_logging=True,
        include_dashboard=True,
        log_to_driver=True,
        logging_level=logging.FATAL,
        _temp_dir="raylet_results/temp",
        _system_config={
            "object_spilling_config": json.dumps(
                {"type": "filesystem", "params": {"directory_path": "raylet_results/object_spilling"}},
            )
        },
    )

But whenever I remove the _temp_dir parameter from ray.init(), it works fine. But I want to set my own custom path for storing the temp files. How to do this?

Do you have logs from gcs_server.out?

No. I am even shutting down the ray before initializing the ray. Still, I am getting.

can you instead try using ray start from CLI? and use --temp-dir" and lmk if you have the same issue. Cluster Management CLI — Ray 3.0.0.dev0