How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I’m trying to achieve dataset profiling for bigger datasets using sparkonray(raydp). Installed raydp-nightly and it works fine in single MAC machine.
But when I’m executing the same code in my on-premise cluster setup it throwing below error.,
(RayDPSparkMaster pid=2746987, ip=172.xx.xx.xx) Worker exits with an exit code None.
2023-12-15 15:54:14,696 WARNING worker.py:2037 – A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff99d77702850c271b9ef91ea802000000 Worker ID: 4b3aa4dd5a5de6e6f7d3078853e331074e5c75dacdf92bf4801757b9 Node ID: 22c22293312b4aaeb0b7902be6290d84f2a52c3f6f8f8fd13a19e4fd Worker IP address: 172.xx.xx.xx Worker port: 10003 Worker PID: 2746987 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker exits unexpectedly. Worker exits with an exit code None.
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 1418, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 1498, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 1424, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 1364, in ray._raylet.execute_task.function_executor
File “/home/xxx/xxx/site-packages/ray/_private/function_manager.py”, line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File “/home/xxx/xxx/site-packages/ray/util/tracing/tracing_helper.py”, line 464, in _resume_span
return method(self, *_args, **_kwargs)
File “/home/xxx/xxx/site-packages/raydp/spark/ray_cluster_master.py”, line 73, in start_up
self._create_app_master(extra_classpath)
File “/home/xxx/xxx/site-packages/ray/util/tracing/tracing_helper.py”, line 464, in _resume_span
return method(self, *_args, **_kwargs)
File “/home/xxx/xxx/site-packages/raydp/spark/ray_cluster_master.py”, line 205, in _create_app_master
self._app_master_java_bridge.startUpAppMaster(extra_classpath, self._configs)
File “/home/xxx/xxx/site-packages/py4j/java_gateway.py”, line 1322, in call
return_value = get_return_value(
File “/home/xxx/xxx/site-packages/py4j/protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o0.startUpAppMaster.
: java.lang.UnsatisfiedLinkError: Unable to load library ‘/tmp/ray/1702635854445/libcore_worker_library_java.so’:
/tmp/ray/1702635854445/libcore_worker_library_java.so: failed to map segment from shared object
/tmp/ray/1702635854445/libcore_worker_library_java.so: failed to map segment from shared object
Native library (tmp/ray/1702635854445/libcore_worker_library_java.so) not found in resource path (…)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:301)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:461)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:403)
at io.ray.runtime.util.JniUtils.loadLibrary(JniUtils.java:72)
at io.ray.runtime.RayNativeRuntime.start(RayNativeRuntime.java:80)
at io.ray.runtime.DefaultRayRuntimeFactory.createRayRuntime(DefaultRayRuntimeFactory.java:37)
at io.ray.api.Ray.init(Ray.java:32)
at io.ray.api.Ray.init(Ray.java:19)
at org.apache.spark.deploy.raydp.AppMasterJavaBridge.startUpAppMaster(AppMasterJavaBridge.scala:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.lang.UnsatisfiedLinkError: /tmp/ray/1702635854445/libcore_worker_library_java.so: failed to map segment from shared object
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
… 19 more
Suppressed: java.lang.UnsatisfiedLinkError: /tmp/ray/1702635854445/libcore_worker_library_java.so: failed to map segment from shared object
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:204)
… 19 more
Ray version : 2.6.3
Cluster provisioned OS : debian
Python version : 3.9.2
raydp-nightly : 2023.12.5.dev0
Kindly anyone help me out in this., TIA