(0524alpa)
# zhangyuchang @ SH-IDC1-10-140-0-32 in /mnt/cache/zhangyuchang/alpa-project-new/alpa on git:main x [10:51:01]
$ TF_CPP_MIN_LOG_LEVEL=0 XLA_FLAGS="--xla_gpu_cuda_data_dir=/mnt/cache/share/platform/dep/cuda11.2-cudnn8.1.1" srun -p caif_dev --gres=gpu:1 --ntasks-per-node=1 -n1 bash test_install.sh
phoenix-srun: Job 2047274 scheduled successfully!
Usage stats collection will be enabled by default in the next release. See https://github.com/ray-project/ray/issues/20857 for more details.
2022-05-26 10:51:51,826 INFO services.py:1462 -- View the Ray dashboard at http://127.0.0.1:8265
2022-05-26 10:51:28,020 INFO scripts.py:697 -- Local node IP: 10.140.1.35
2022-05-26 10:51:52,889 SUCC scripts.py:739 -- --------------------
2022-05-26 10:51:52,895 SUCC scripts.py:740 -- Ray runtime started.
2022-05-26 10:51:52,895 SUCC scripts.py:741 -- --------------------
2022-05-26 10:51:52,895 INFO scripts.py:743 -- Next steps
2022-05-26 10:51:52,895 INFO scripts.py:744 -- To connect to this Ray runtime from another node, run
2022-05-26 10:51:52,895 INFO scripts.py:749 -- ray start --address='10.140.1.35:6379'
2022-05-26 10:51:52,895 INFO scripts.py:752 -- Alternatively, use the following Python code:
2022-05-26 10:51:52,895 INFO scripts.py:754 -- import ray
2022-05-26 10:51:52,895 INFO scripts.py:767 -- ray.init(address='auto')
2022-05-26 10:51:52,895 INFO scripts.py:771 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-05-26 10:51:52,895 INFO scripts.py:775 -- connect to a remote cluster from your laptop directly, use the following
2022-05-26 10:51:52,895 INFO scripts.py:778 -- Python code:
2022-05-26 10:51:52,895 INFO scripts.py:780 -- import ray
2022-05-26 10:51:52,896 INFO scripts.py:786 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-05-26 10:51:52,896 INFO scripts.py:792 -- If connection fails, check your firewall settings and network configuration.
2022-05-26 10:51:52,896 INFO scripts.py:798 -- To terminate the Ray runtime, run
2022-05-26 10:51:52,896 INFO scripts.py:799 -- ray stop
succeed===========
======== Autoscaler status: 2022-05-26 10:52:00.703027 ========
Node status
---------------------------------------------------------------
Healthy:
1 node_0ca17cc9a4cbd848e7533201da8225eaf45b5e75aa1e695568e6338e
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/128.0 CPU
0.0/1.0 GPU
0.00/383.261 GiB memory
0.00/168.246 GiB object_store_memory
Demands:
(no resource demands)
now running python script
2022-05-26 10:52:13.094558: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-05-26 10:52:22.084142: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-05-26 10:53:37.499963: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x555df8aeb380 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-05-26 10:53:37.500012: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182] StreamExecutor device (0): Interpreter, <undefined>
2022-05-26 10:53:37.571849: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-05-26 10:53:38.277168: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x555df907f1c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-05-26 10:53:38.277227: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182] StreamExecutor device (0): A100-SXM-80GB, Compute Capability 8.0
2022-05-26 10:53:38.278839: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/gpu_device.cc:341] Using platform allocator.
2022-05-26 10:53:38.297082: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
.2022-05-26 10:56:08.136490: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/service.cc:369] Jax service listening on 10.140.1.35:20020
2022-05-26 11:01:14.346240: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/service.cc:381] Jax service shutting down
2022-05-26 11:01:14.354351: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/service.cc:381] Jax service shutting down
(pid=107432) 2022-05-26 10:54:44.976064: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(pid=107432) 2022-05-26 10:54:51.285440: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(raylet) [2022-05-26 10:56:40,707 E 104827 104827] (raylet) worker_pool.cc:518: Some workers of the worker process(109411) have not registered within the timeout. The process is still alive, probably it's hanging during start.
(raylet) [2022-05-26 10:57:11,579 E 104827 104827] (raylet) worker_pool.cc:518: Some workers of the worker process(109736) have not registered within the timeout. The process is still alive, probably it's hanging during start.
(raylet) [2022-05-26 10:57:43,272 E 104827 104827] (raylet) worker_pool.cc:518: Some workers of the worker process(110164) have not registered within the timeout. The process is still alive, probably it's hanging during start.
(pid=110494) 2022-05-26 10:57:53.987877: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(pid=110494) 2022-05-26 10:57:57.747614: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(MeshHostWorker pid=110494) 2022-05-26 11:01:07.060563: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.cc:166] Connect failed() with status: DEADLINE_EXCEEDED: Deadline Exceeded
(MeshHostWorker pid=110494) 2022-05-26 11:01:07.094402: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.cc:177] Connect() failed after 1 retries in 0; most recent failure status: DEADLINE_EXCEEDED: Deadline Exceeded
(MeshHostWorker pid=110494) 2022-05-26 11:01:09,339 ERROR worker.py:449 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::MeshHostWorker.__init__() (pid=110494, ip=10.140.1.35, repr=<alpa.device_mesh.MeshHostWorker object at 0x7f88b8089890>)
(MeshHostWorker pid=110494) File "/mnt/cache/zhangyuchang/alpa-project-new/alpa/alpa/device_mesh.py", line 96, in __init__
(MeshHostWorker pid=110494) self.distributed_client.connect()
(MeshHostWorker pid=110494) RuntimeError: DEADLINE_EXCEEDED: Connect() timed out after 0 with 1 attempts. Most recent failure was: DEADLINE_EXCEEDED: Deadline Exceeded
(MeshHostWorker pid=110494) E0526 11:01:12.421620596 110604 chttp2_transport.cc:1103] Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
E
======================================================================
FAILED (errors=1)
2022-05-26 11:01:40,110 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --store_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --object_manager_port=0 --min_worker_port=10002 --max_worker_port=19999 --node_manager_port=0 --node_ip_address=10.140.1.35 --maximum_startup_concurrency=128 --static_resource_list=node:10.140.1.35,1.0,CPU,128,GPU,1,memory,411523096781,object_store_memory,180652755763 "--python_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/default_worker.py --node-ip-address=10.140.1.35 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=62967 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER --redis-password=5241590000000000" "--java_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py java -Dray.address=10.140.1.35:6379 -Dray.raylet.node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER -Dray.object-store.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store -Dray.raylet.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet -Dray.redis.password=5241590000000000 -Dray.node-ip=10.140.1.35 -Dray.home=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/../.. -Dray.logging.dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs -Dray.session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 -cp /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/jars/* RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER io.ray.runtime.runner.worker.DefaultWorker" --cpp_worker_command= --native_library_path=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/cpp/lib --redis_password=5241590000000000 --temp_dir=/tmp/ray --session_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --resource_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --metrics-agent-port=62967 --metrics_export_port=64128 --object_store_memory=180652755763 --plasma_directory=/dev/shm --ray-debugger-external=0 --gcs-address=10.140.1.35:6379 "--agent_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379"` (via SIGTERM)
2022-05-26 11:01:40,111 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/gcs/gcs_server --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --config_list=eyJvYmplY3Rfc3BpbGxpbmdfY29uZmlnIjogIntcInR5cGVcIjogXCJmaWxlc3lzdGVtXCIsIFwicGFyYW1zXCI6IHtcImRpcmVjdG9yeV9wYXRoXCI6IFwiL3RtcC9yYXkvc2Vzc2lvbl8yMDIyLTA1LTI2XzEwLTUxLTI4XzA3ODk1M18xMDQzNjFcIn19IiwgImlzX2V4dGVybmFsX3N0b3JhZ2VfdHlwZV9mcyI6IHRydWV9 --gcs_server_port=6379 --metrics-agent-port=62967 --node-ip-address=10.140.1.35 --redis_password=5241590000000000` (via SIGTERM)
2022-05-26 11:01:40,115 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py --logs-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 --redis-password=5241590000000000 --monitor-ip=10.140.1.35` (via SIGTERM)
2022-05-26 11:01:40,116 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/_private/log_monitor.py --logs-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --gcs-address=10.140.1.35:6379 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5` (via SIGTERM)
2022-05-26 11:01:40,120 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -m ray.util.client.server --address=10.140.1.35:6379 --host=0.0.0.0 --port=10001 --mode=proxy --redis-password=5241590000000000 --metrics-agent-port=62967` (via SIGTERM)
2022-05-26 11:01:40,129 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --store_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --object_manager_port=0 --min_worker_port=10002 --max_worker_port=19999 --node_manager_port=0 --node_ip_address=10.140.1.35 --maximum_startup_concurrency=128 --static_resource_list=node:10.140.1.35,1.0,CPU,128,GPU,1,memory,411523096781,object_store_memory,180652755763 "--python_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/default_worker.py --node-ip-address=10.140.1.35 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=62967 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER --redis-password=5241590000000000" "--java_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py java -Dray.address=10.140.1.35:6379 -Dray.raylet.node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER -Dray.object-store.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store -Dray.raylet.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet -Dray.redis.password=5241590000000000 -Dray.node-ip=10.140.1.35 -Dray.home=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/../.. -Dray.logging.dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs -Dray.session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 -cp /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/jars/* RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER io.ray.runtime.runner.worker.DefaultWorker" --cpp_worker_command= --native_library_path=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/cpp/lib --redis_password=5241590000000000 --temp_dir=/tmp/ray --session_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --resource_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --metrics-agent-port=62967 --metrics_export_port=64128 --object_store_memory=180652755763 --plasma_directory=/dev/shm --ray-debugger-external=0 --gcs-address=10.140.1.35:6379 "--agent_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379"` (via SIGTERM)
2022-05-26 11:01:40,134 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --store_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --object_manager_port=0 --min_worker_port=10002 --max_worker_port=19999 --node_manager_port=0 --node_ip_address=10.140.1.35 --maximum_startup_concurrency=128 --static_resource_list=node:10.140.1.35,1.0,CPU,128,GPU,1,memory,411523096781,object_store_memory,180652755763 "--python_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/default_worker.py --node-ip-address=10.140.1.35 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=62967 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER --redis-password=5241590000000000" "--java_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py java -Dray.address=10.140.1.35:6379 -Dray.raylet.node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER -Dray.object-store.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store -Dray.raylet.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet -Dray.redis.password=5241590000000000 -Dray.node-ip=10.140.1.35 -Dray.home=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/../.. -Dray.logging.dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs -Dray.session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 -cp /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/jars/* RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER io.ray.runtime.runner.worker.DefaultWorker" --cpp_worker_command= --native_library_path=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/cpp/lib --redis_password=5241590000000000 --temp_dir=/tmp/ray --session_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --resource_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --metrics-agent-port=62967 --metrics_export_port=64128 --object_store_memory=180652755763 --plasma_directory=/dev/shm --ray-debugger-external=0 --gcs-address=10.140.1.35:6379 "--agent_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379"` (via SIGTERM)
2022-05-26 11:01:40,139 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --store_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --object_manager_port=0 --min_worker_port=10002 --max_worker_port=19999 --node_manager_port=0 --node_ip_address=10.140.1.35 --maximum_startup_concurrency=128 --static_resource_list=node:10.140.1.35,1.0,CPU,128,GPU,1,memory,411523096781,object_store_memory,180652755763 "--python_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/default_worker.py --node-ip-address=10.140.1.35 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=62967 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER --redis-password=5241590000000000" "--java_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py java -Dray.address=10.140.1.35:6379 -Dray.raylet.node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER -Dray.object-store.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store -Dray.raylet.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet -Dray.redis.password=5241590000000000 -Dray.node-ip=10.140.1.35 -Dray.home=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/../.. -Dray.logging.dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs -Dray.session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 -cp /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/jars/* RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER io.ray.runtime.runner.worker.DefaultWorker" --cpp_worker_command= --native_library_path=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/cpp/lib --redis_password=5241590000000000 --temp_dir=/tmp/ray --session_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --resource_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --metrics-agent-port=62967 --metrics_export_port=64128 --object_store_memory=180652755763 --plasma_directory=/dev/shm --ray-debugger-external=0 --gcs-address=10.140.1.35:6379 "--agent_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379"` (via SIGTERM)
2022-05-26 11:01:40,145 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/_private/log_monitor.py --logs-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --gcs-address=10.140.1.35:6379 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5` (via SIGTERM)
2022-05-26 11:01:40,153 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/dashboard.py --host=localhost --port=8265 --port-retries=0 --temp-dir=/tmp/ray --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379` (via SIGTERM)
2022-05-26 11:01:40,158 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --store_socket_name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --object_manager_port=0 --min_worker_port=10002 --max_worker_port=19999 --node_manager_port=0 --node_ip_address=10.140.1.35 --maximum_startup_concurrency=128 --static_resource_list=node:10.140.1.35,1.0,CPU,128,GPU,1,memory,411523096781,object_store_memory,180652755763 "--python_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/default_worker.py --node-ip-address=10.140.1.35 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=62967 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379 RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER --redis-password=5241590000000000" "--java_worker_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/workers/setup_worker.py java -Dray.address=10.140.1.35:6379 -Dray.raylet.node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER -Dray.object-store.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store -Dray.raylet.socket-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet -Dray.redis.password=5241590000000000 -Dray.node-ip=10.140.1.35 -Dray.home=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/../.. -Dray.logging.dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs -Dray.session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 -cp /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/jars/* RAY_WORKER_DYNAMIC_OPTION_PLACEHOLDER io.ray.runtime.runner.worker.DefaultWorker" --cpp_worker_command= --native_library_path=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/cpp/lib --redis_password=5241590000000000 --temp_dir=/tmp/ray --session_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --log_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --resource_dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --metrics-agent-port=62967 --metrics_export_port=64128 --object_store_memory=180652755763 --plasma_directory=/dev/shm --ray-debugger-external=0 --gcs-address=10.140.1.35:6379 "--agent_command=/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=RAY_NODE_MANAGER_PORT_PLACEHOLDER --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379"` (via SIGTERM)
2022-05-26 11:01:40,158 VINFO scripts.py:988 -- Send termination request to `/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/bin/python -u /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/agent.py --node-ip-address=10.140.1.35 --metrics-export-port=64128 --dashboard-agent-port=62967 --listen-port=0 --node-manager-port=39616 --object-store-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/sockets/raylet --temp-dir=/tmp/ray --session-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361 --runtime-env-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/runtime_resources --log-dir=/tmp/ray/session_2022-05-26_10-51-28_078953_104361/logs --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=10.140.1.35:6379` (via SIGTERM)
2022-05-26 11:01:46,143 SUCC scripts.py:1033 -- Stopped all 7 Ray processes.