Raylet worker doesn't respect RAY_TMPDIR

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: v2.44.1
  • Python version: python 3.12
  • OS: centos 8.4
  • Cloud/Infrastructure: Premise
  • Other libs/tools (if relevant): No

3. What happened vs. what you expected:

  • Expected:
    • env RAY_TMPDIR=/nfs/dev/${hostname} ray start ***. It should respect the RAY_TMPDIR.
  • Actual:
    • For above command, it respects the head’s RAY_TMPDIR. RAY_TMPDIR specifies a NFS path which is different for differnt compute node. If all worker nodes share the same temp dir, the /nfs/DEV/PLT/zpeng/ray/tmp/twdev2/ray/session_latest/node_ip_address.json will be overwritten .

checked some code, for non-head worker, it queries GCS for temp dir. What’s the reason for it?

def \_init_temp(self):

    \# Create a dictionary to store temp file index.

    self.\_incremental_dict = collections.defaultdict(lambda: 0)



    if self.head:

        self.\_ray_params.update_if_absent(

            temp_dir=ray.\_private.utils.get_ray_temp_dir()

        )

        self.\_temp_dir = self.\_ray_params.temp_dir

    else:

        if self.\_ray_params.temp_dir is None:

            assert not self.\_default_worker

            temp_dir = ray.\_private.utils.internal_kv_get_with_retry(

                self.get_gcs_client(),

                "temp_dir",

                ray_constants.KV_NAMESPACE_SESSION,

                num_retries=ray_constants.NUM_REDIS_GET_RETRIES,

            )

            self.\_temp_dir = ray.\_private.utils.decode(temp_dir)

        else:

            self.\_temp_dir = self.\_ray_params.temp_dir

Thanks for the question!

Currently, to specify the temp directory, it is recommended to using --temp-dir option in the ray start command as mentioned here. At the same time, unfortunately, we don’t support setting different temp directories for head node and worker node separately.

At the same time, we received many customer asks about this so we are planning to work on the support soon. Will keep you posted.

1 Like