1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: v2.44.1
- Python version: python 3.12
- OS: centos 8.4
- Cloud/Infrastructure: Premise
- Other libs/tools (if relevant): No
3. What happened vs. what you expected:
- Expected:
- env RAY_TMPDIR=/nfs/dev/${hostname} ray start ***. It should respect the RAY_TMPDIR.
- Actual:
- For above command, it respects the head’s RAY_TMPDIR. RAY_TMPDIR specifies a NFS path which is different for differnt compute node. If all worker nodes share the same temp dir, the /nfs/DEV/PLT/zpeng/ray/tmp/twdev2/ray/session_latest/node_ip_address.json will be overwritten .
checked some code, for non-head worker, it queries GCS for temp dir. What’s the reason for it?
def \_init_temp(self):
\# Create a dictionary to store temp file index.
self.\_incremental_dict = collections.defaultdict(lambda: 0)
if self.head:
self.\_ray_params.update_if_absent(
temp_dir=ray.\_private.utils.get_ray_temp_dir()
)
self.\_temp_dir = self.\_ray_params.temp_dir
else:
if self.\_ray_params.temp_dir is None:
assert not self.\_default_worker
temp_dir = ray.\_private.utils.internal_kv_get_with_retry(
self.get_gcs_client(),
"temp_dir",
ray_constants.KV_NAMESPACE_SESSION,
num_retries=ray_constants.NUM_REDIS_GET_RETRIES,
)
self.\_temp_dir = ray.\_private.utils.decode(temp_dir)
else:
self.\_temp_dir = self.\_ray_params.temp_dir