The documentation says that “In recent versions of Ray (>=1.5), ray.init()
is automatically called on the first use of a Ray remote API.” I am running Ray 2.3.1 on Python 3.9.16, OS X 13.3.
The following code works.
import ray
def main():
ray.init()
document_paths = ["local:///path/to/my/files"]
pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
pdf_data_set.show(1)
if __name__ == "__main__":
main()
If I remove the ray.init()
I see the following error.
Traceback (most recent call last):
File "/Users/bill.mcneill/Src/simple_pages.py", line 13, in <module>
main()
File "/Users/bill.mcneill/Src/simple_pages.py", line 8, in main
pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/data/read_api.py", line 1130, in read_binary_files
return read_datasource(
File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/data/read_api.py", line 271, in read_datasource
ray.get_runtime_context().get_node_id(),
File "/usr/local/anaconda3/envs/lingua/lib/python3.9/site-packages/ray/runtime_context.py", line 102, in get_node_id
assert ray.is_initialized(), (
AssertionError: Node ID is not available because Ray has not been initialized.
If I also remove the local:/
prefix from the file path, everything works fine again.
import ray
def main():
document_paths = ["/path/to/my/files"]
pdf_data_set = ray.data.read_binary_files(document_paths, include_paths=True)
pdf_data_set.show(1)
if __name__ == "__main__":
main()
I can add the ray.init()
in to make this work, but since the documentation says that I don’t have to I think is is a bug. Is it a bug?