Hi there, I’m having SIGSEGV (segmentation fault) error when instantiate RunConfig object with FSSPEC Hadoop filesystem. I wonder, how do I debug this error? Any help appreciated, please see my details below:
Env:
[root@/app #] ray --version
ray, version 2.8.0
[root@/app #] python --version
Python 3.9.2
[root@/app #] java -version
openjdk version "1.8.0_292"
[root@/app #] hadoop version
Hadoop 2.8.2
Compiled with protoc 2.5.0
Code:
def create_run_config(checkpoint_config: CheckpointConfig) -> RunConfig:
result_dir_url = os.environ[ENV_RESULT_DIR_URL]
log.info("result_dir_url: %r", result_dir_url)
storage_filesystem, storage_path = fsspec.core.url_to_fs(result_dir_url)
log.info("storage_path: %r", storage_path)
log.info("storage_filesystem [0]: %r", storage_filesystem)
storage_filesystem = PyFileSystem(FSSpecHandler(storage_filesystem))
log.info("storage_filesystem [1]: %r", storage_filesystem)
run_config_options = dict(
checkpoint_config=checkpoint_config,
storage_path=storage_path,
storage_filesystem=storage_filesystem,
)
log.info("run_config_options: %r", run_config_options)
log.info("create run_config ...")
run_config = RunConfig(**run_config_options)
log.info("run_config: %r", run_config)
return run_config
Logs:
app.bert_cola.train | INFO | result_dir_url: 'hdfs:///app/xxx/workspaces/dc632718-6f85-4bed-8b96-109fa5eb70c1/app.bert_cola.train.train.cec122ada5cc44398cae54754a1281df'
app.bert_cola.train | INFO | storage_path: '/app/xxx/workspaces/dc632718-6f85-4bed-8b96-109fa5eb70c1/app.bert_cola.train.train.cec122ada5cc44398cae54754a1281df'
app.bert_cola.train | INFO | storage_filesystem [0]: <fsspec.implementations.arrow.HadoopFileSystem object at 0x7f28a3d1dca0>
app.bert_cola.train | INFO | storage_filesystem [1]: <pyarrow._fs.PyFileSystem object at 0x7f28385f16f0>
app.bert_cola.train | INFO | run_config_options: {'checkpoint_config': CheckpointConfig(num_to_keep=2, checkpoint_score_attribute='matthews_correlation'), 'storage_path': '/app/xxx/workspaces/dc632718-6f85-4bed-8b96-109fa5eb70c1/app.bert_cola.train.train.cec122ada5cc44398cae54754a1281df', 'storage_filesystem': <pyarrow._fs.PyFileSystem object at 0x7f28385f16f0>}
app.bert_cola.train | INFO | create run_config ...
*** SIGSEGV received at time=1699909076 on cpu 50 ***
PC: @ 0x7f2a88093871 (unknown) __pyx_f_7pyarrow_3_fs__cb_equals()
@ 0x7f2a914eb140 (unknown) (unknown)
@ 0x7f2a8fe64f53 96 arrow::py::SafeCallIntoPython<>()
@ 0x7f2a8fe67c85 112 arrow::fs::FileSystem::Equals()
@ 0x7f2a8809cc2a 96 __pyx_pw_7pyarrow_3_fs_10FileSystem_5equals()
@ 0x7f2a8808759a 64 __Pyx_PyObject_CallOneArg()
@ 0x7f2a8808c2b3 144 __pyx_pf_7pyarrow_3_fs_10FileSystem_6__eq__()
@ 0x7f2a8808c61d 48 __pyx_tp_richcompare_7pyarrow_3_fs_FileSystem()
@ 0x5337fb (unknown) PyObject_RichCompare
@ 0x906f40 (unknown) (unknown)
[2023-11-13 20:57:56,488 E 17781 17781] logging.cc:361: *** SIGSEGV received at time=1699909076 on cpu 50 ***
[2023-11-13 20:57:56,488 E 17781 17781] logging.cc:361: PC: @ 0x7f2a88093871 (unknown) __pyx_f_7pyarrow_3_fs__cb_equals()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a914eb140 (unknown) (unknown)
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8fe64f53 96 arrow::py::SafeCallIntoPython<>()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8fe67c85 112 arrow::fs::FileSystem::Equals()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8809cc2a 96 __pyx_pw_7pyarrow_3_fs_10FileSystem_5equals()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8808759a 64 __Pyx_PyObject_CallOneArg()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8808c2b3 144 __pyx_pf_7pyarrow_3_fs_10FileSystem_6__eq__()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x7f2a8808c61d 48 __pyx_tp_richcompare_7pyarrow_3_fs_FileSystem()
[2023-11-13 20:57:56,489 E 17781 17781] logging.cc:361: @ 0x5337fb (unknown) PyObject_RichCompare
[2023-11-13 20:57:56,491 E 17781 17781] logging.cc:361: @ 0x906f40 (unknown) (unknown)
Fatal Python error: Segmentation fault
Stack (most recent call first):
File "/usr/local/lib/python3.9/dist-packages/ray/air/config.py", line 78 in _repr_dataclass
File "/usr/local/lib/python3.9/dist-packages/ray/air/config.py", line 659 in __repr__
File "/usr/lib/python3.9/logging/__init__.py", line 363 in getMessage
File "/usr/lib/python3.9/logging/__init__.py", line 659 in format
File "/usr/lib/python3.9/logging/__init__.py", line 923 in format
File "/usr/lib/python3.9/logging/__init__.py", line 1079 in emit
File "/usr/lib/python3.9/logging/__init__.py", line 948 in handle
File "/usr/lib/python3.9/logging/__init__.py", line 1657 in callHandlers
File "/usr/lib/python3.9/logging/__init__.py", line 1595 in handle
File "/usr/lib/python3.9/logging/__init__.py", line 1585 in _log
File "/usr/lib/python3.9/logging/__init__.py", line 1442 in info
File "/app/app/bert_cola/train.py", line 127 in create_run_config
File "/app/app/bert_cola/train.py", line 36 in train
File "/app/sdk/decorators.py", line 35 in wrapper
File "/app/sdk/run_task.py", line 66 in run
File "/app/sdk/run_task.py", line 79 in main
File "/app/sdk/run_task.py", line 105 in <module>
File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main