Hi team,
i am periodically getting an error while doing ray.data.read_parquet from hdfs.
I am working with ray train and this completely fails the job (max failures in run_config doesn’t help)
dataset = ray.data.read_parquet("hdfs://...")
trainer = TorchTrainer(
run_config=RunConfig(failure_config=FailureConfig(max_failures=1000))
dataset_config={
"train": DatasetConfig(
required=True,
fit=False,
transform=True,
split=True,
max_object_store_memory_fraction=max_object_store_memory_fraction,
randomize_block_order=True,
)
},
datasets={"train": dataset}
)
Any hints?
[2023-04-04 07:21:15,161 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: PC: @ 0x7f15a8181574 (unknown) ObjectMonitor::enter()
[2023-04-04 07:21:15,161 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f16171a6420 3536 (unknown)
[2023-04-04 07:21:15,161 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a7edee0d 80 InterpreterRuntime::monitoren
ter()
[2023-04-04 07:21:15,164 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce8740d 112 (unknown)
[2023-04-04 07:21:15,170 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce671c1 216 (unknown)
[2023-04-04 07:21:15,175 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce671c1 88 (unknown)
[2023-04-04 07:21:15,180 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce671c1 120 (unknown)
[2023-04-04 07:21:15,186 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce66f20 120 (unknown)
[2023-04-04 07:21:15,191 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce66f20 88 (unknown)
[2023-04-04 07:21:15,229 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce67206 176 (unknown)
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f14dce5f50b 120 (unknown)
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a7eeb194 384 JavaCalls::call_helper()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a7eec6d7 224 JavaCalls::call_virtual()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a7eecc10 160 JavaCalls::call_virtual()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a7f87da1 128 thread_entry()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a82fb425 176 JavaThread::thread_main_inner
()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f15a81a3002 864 java_start()
[2023-04-04 07:21:15,235 E 1858 2075] (python-core-driver-02000000ffffffffffffffffffffffffffffffffffffffffffffffff) logging.cc:361: @ 0x7f161719a609 (unknown) start_thread
Fatal Python error: Segmentation fault```