I read data from hdfs using ray data(2.39.0) but meet Segmentation fault
even after ray.shutdown()
is executed.
My job is simple:
import sys
import ray
from pyarrow.fs import HadoopFileSystem
import pyarrow
ray.init()
if __name__ == "__main__":
hdfs = HadoopFileSystem(xxx)
ds = ray.data.read_parquet("/test/xxx.parquet", filesystem=hdfs)
print(ds.schema())
ds.show(10)
print("before shutdown")
ray.shutdown()
print("after shutdown")
All code is executed and “after shutdown” is printed, but I meet a Segmentation fault
(sometimes not).
If I use pure pyarrow to read the data, everything is ok.
what had happened, and what should I do?