Explicit call to ray.init() need when reading from local:/

The documentation says that “In recent versions of Ray (>=1.5), ray.init() is automatically called on the first use of a Ray remote API.” I am running Ray 2.3.1 on Python 3.9.16, OS X 13.3.

The following code works.

import ray

def main():
    ray.init()
    document_paths = ["local:///path/to/my/files"]
    pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
    pdf_data_set.show(1)


if __name__ == "__main__":
    main()

If I remove the ray.init() I see the following error.

Traceback (most recent call last):
  File "/Users/bill.mcneill/Src/simple_pages.py", line 13, in <module>
    main()
  File "/Users/bill.mcneill/Src/simple_pages.py", line 8, in main
    pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
  File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/data/read_api.py", line 1130, in read_binary_files
    return read_datasource(
  File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/data/read_api.py", line 271, in read_datasource
    ray.get_runtime_context().get_node_id(),
  File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/runtime_context.py", line 102, in get_node_id
    assert ray.is_initialized(), (
AssertionError: Node ID is not available because Ray has not been initialized.

If I also remove the local:/ prefix from the file path, everything works fine again.

import ray

def main():
    document_paths = ["/path/to/my/files"]
    pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
    pdf_data_set.show(1)

if __name__ == "__main__":
    main()

I can add the ray.init() in to make this work, but since the documentation says that I don’t have to I think is is a bug. Is it a bug?

Hi @wpm, thanks for reporting, I think there’s a bug when reading local file when Ray cluster is not initialized. Let me double check and get back to you.

[Data] Local read throws exception if Ray cluster is not initialized · Issue #34631 · ray-project/ray · GitHub is tracked to fix the bug.