I managed to load the parquet files by manually iterating over the .parquet files in the .parquet folder and concatenating the sub dataframes, similar to my initial load.
raylet.out tail
[2024-08-06 17:53:40,919 I 18208 17048] (raylet.exe) node_manager.cc:525: [state-dump] NodeManager:
[state-dump] Node ID: 11c5bc1e90bcd260a4d9cd7a677060b61cbbe24898aa765e8491f8f8
[state-dump] Node name:
[state-dump] InitialConfigResources: {memory: 2684354560000000, object_store_memory: 130524131320000, GPU: 10000, accelerator_type:G: 10000, node:internal_head: 10000, node: 10000, CPU: 200000}
[state-dump] ClusterTaskManager:
[state-dump] ========== Node: 11c5bc1e90bcd260a4d9cd7a677060b61cbbe24898aa765e8491f8f8 =================
[state-dump] Infeasible queue length: 0
[state-dump] Schedule queue length: 0
[state-dump] Dispatch queue length: 6
[state-dump] num_waiting_for_resource: 0
[state-dump] num_waiting_for_plasma_memory: 5
[state-dump] num_waiting_for_remote_node_resources: 0
[state-dump] num_worker_not_started_by_job_config_not_exist: 0
[state-dump] num_worker_not_started_by_registration_timeout: 0
[state-dump] num_tasks_waiting_for_workers: 1
[state-dump] num_cancelled_tasks: 0
[state-dump] cluster_resource_scheduler state:
[state-dump] Local id: 6699133724476589756 Local resources: {“total”:{memory: [2684354560000000], node:internal_head: [10000], object_store_memory: [130524131320000], GPU: [10000], node: [10000], accelerator_type:G: [10000], CPU: [200000]}}, “available”: {CPU: [160000], node:internal_head: [10000], GPU: [10000], object_store_memory: [41676085680000], node: [10000], accelerator_type:G: [10000], memory: [2684354560000000]}}, “labels”:{“ray.io/node_id":"11c5bc1e90bcd260a4d9cd7a677060b61cbbe24898aa765e8491f8f8”,} is_draining: 0 is_idle: 0 Cluster resources: node id: 6699133724476589756{“total”:{accelerator_type:G: 10000, GPU: 10000, object_store_memory: 130524131320000, node: 10000, memory: 2684354560000000, node:internal_head: 10000, CPU: 200000}}, “available”: {GPU: 10000, accelerator_type:G: 10000, memory: 2684354560000000, object_store_memory: 41676085680000, node: 10000, CPU: 160000, node:internal_head: 10000}}, “labels”:{“ray.io/node_id":"11c5bc1e90bcd260a4d9cd7a677060b61cbbe24898aa765e8491f8f8”,}, “is_draining”: 0, “draining_deadline_timestamp_ms”: -1} { “placment group locations”: , “node to bundles”: }
[state-dump] Waiting tasks size: 11
[state-dump] Number of executing tasks: 5
[state-dump] Number of pinned task arguments: 81
[state-dump] Number of total spilled tasks: 0
[state-dump] Number of spilled waiting tasks: 0
[state-dump] Number of spilled unschedulable tasks: 0
[state-dump] Resource usage {
[state-dump] - (language=PYTHON actor_or_task=_deploy_ray_func pid=34212 worker_id=c335baa0647805237ea166ae5612bf3fb4972e18f08183fbe50b482b): {CPU: 10000}
[state-dump] - (language=PYTHON actor_or_task=_deploy_ray_func pid=25720 worker_id=4d6a02e2097c2759b8a6afef63d8c0d2f29595df401467c692389b7d): {CPU: 10000}
[state-dump] - (language=PYTHON actor_or_task=_deploy_ray_func pid=32744 worker_id=ca71e19b7f9b29f14a488c8b669b33d2408e2806169147e07c971863): {CPU: 10000}
[state-dump] - (language=PYTHON actor_or_task=_deploy_ray_func pid=34960 worker_id=f1c1bc3b54e5cef6d9812d0460bc9a12d4bec70805678f7ee0184884): {CPU: 10000}
[state-dump] }
[state-dump] Running tasks by scheduling class:
[state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=ray.data._internal.stats, class_name=_StatsActor, function_name=init, function_hash=d0b7803d915a49409aff6b327f9190ef} scheduling_strategy=node_affinity_scheduling_strategy {
[state-dump] node_id: “\021\305\274\036\220\274\322\244\331\315zgp
[state-dump] }
[state-dump] resource_set={}}: 1/18446744073709551615
[state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=modin.core.execution.ray.implementations.pandas_on_ray.partitioning.virtual_partition, class_name=, function_name=_deploy_ray_func, function_hash=bcc424c2f49e4803b4dbb629bc272513} scheduling_strategy=default_scheduling_strategy {
[state-dump] }
[state-dump] resource_set={CPU : 1, }}: 4/20
[state-dump] ==================================================
[state-dump] ClusterResources:
[state-dump] LocalObjectManager:
[state-dump] - num pinned objects: 0
[state-dump] - pinned objects size: 0
[state-dump] - num objects pending restore: 0
[state-dump] - num objects pending spill: 4
[state-dump] - num bytes pending spill: 8884804564
[state-dump] - num bytes currently spilled: 71035560890
[state-dump] - cumulative spill requests: 816
[state-dump] - cumulative restore requests: 633
[state-dump] - spilled objects pending delete: 0
[state-dump] ObjectManager:
[state-dump] - num local objects: 88
[state-dump] - num unfulfilled push requests: 0
[state-dump] - num object pull requests: 18
[state-dump] - num chunks received total: 0
[state-dump] - num chunks received failed (all): 0
[state-dump] - num chunks received failed / cancelled: 0
[state-dump] - num chunks received failed / plasma error: 0
[state-dump] Event stats:
[state-dump] Global stats: 66 total (0 active)
[state-dump] Queueing time: mean = 1.943 ms, max = 25.783 ms, min = 8.232 us, total = 128.212 ms
[state-dump] Execution time: mean = 9.728 us, total = 642.072 us
[state-dump] Event stats:
[state-dump] ObjectManager.FreeObjects - 66 total (0 active), Execution time: mean = 9.728 us, total = 642.072 us, Queueing time: mean = 1.943 ms, max = 25.783 ms, min = 8.232 us, total = 128.212 ms
[state-dump] PushManager:
[state-dump] - num pushes in flight: 0
[state-dump] - num chunks in flight: 0
[state-dump] - num chunks remaining: 0
[state-dump] - max chunks allowed: 409
[state-dump] OwnershipBasedObjectDirectory:
[state-dump] - num listeners: 18
[state-dump] - cumulative location updates: 15788333336426
[state-dump] - num location updates per second: 0.000
[state-dump] - num location lookups per second: 0.000
[state-dump] - num locations added per second: 0.000
[state-dump] - num locations removed per second: 0.000
[state-dump] BufferPool:
[state-dump] - create buffer state map size: 0
[state-dump] PullManager:
[state-dump] - num bytes available for pulled objects: 0
[state-dump] - num bytes being pulled (all): 2221208186
[state-dump] - num bytes being pulled / pinned: 2221208186
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
[state-dump] - task request bundles: BundlePullRequestQueue{16 total, 1 active, 15 inactive, 0 unpullable}
[state-dump] - first get request bundle: N/A
[state-dump] - first wait request bundle: N/A
[state-dump] - first task request bundle: 3 objects, 2221208186 bytes (inactive, waiting for capacity)
[state-dump] - num objects queued: 18
[state-dump] - num objects actively pulled (all): 3
[state-dump] - num objects actively pulled / pinned: 3
[state-dump] - num bundles being pulled: 1
[state-dump] - num pull retries: 0
[state-dump] - max timeout seconds: 10
[state-dump] - max timeout request is already processed. No entry.
[state-dump] WorkerPool:
[state-dump] - registered jobs: 2
[state-dump] - process_failed_job_config_missing: 0
[state-dump] - process_failed_rate_limited: 0
[state-dump] - process_failed_pending_registration: 0
[state-dump] - process_failed_runtime_env_setup_failed: 0
[state-dump] - num PYTHON workers: 28
[state-dump] - num PYTHON drivers: 2
[state-dump] - num object spill callbacks queued: 0
[state-dump] - num object restore queued: 0
[state-dump] - num util functions queued: 0
[state-dump] - num idle workers: 16
[state-dump] TaskDependencyManager:
[state-dump] - task deps map size: 16
[state-dump] - get req map size: 0
[state-dump] - wait req map size: 0
[state-dump] - local objects map size: 88
[state-dump] WaitManager:
[state-dump] - num active wait requests: 0
[state-dump] Subscriber:
[state-dump] Channel WORKER_OBJECT_EVICTION
[state-dump] - cumulative subscribe requests: 860
[state-dump] - cumulative unsubscribe requests: 440
[state-dump] - active subscribed publishers: 1
[state-dump] - cumulative published messages: 440
[state-dump] - cumulative processed messages: 440
[state-dump] - cumulative subscribe requests: 2866
[state-dump] - cumulative unsubscribe requests: 2848
[state-dump] - active subscribed publishers: 1
[state-dump] - cumulative published messages: 3144
[state-dump] - cumulative processed messages: 1965
[state-dump] - cumulative subscribe requests: 0
[state-dump] - cumulative unsubscribe requests: 0
[state-dump] - active subscribed publishers: 0
[state-dump] - cumulative published messages: 0
[state-dump] - cumulative processed messages: 0
[state-dump] num async plasma notifications: 0
[state-dump] Remote node managers:
[state-dump] Event stats:
[state-dump] Global stats: 81938 total (69 active)
[state-dump] Queueing time: mean = 25.725 ms, max = 40.485 s, min = -0.001 s, total = 2107.849 s
[state-dump] Execution time: mean = 138.049 ms, total = 11311.475 s
[state-dump] Event stats:
[state-dump] NodeManager.SpillObjects - 8539 total (1 active), Execution time: mean = 13.282 us, total = 113.412 ms, Queueing time: mean = 285.694 us, max = 54.399 ms, min = 1.879 us, total = 2.440 s
[state-dump] NodeManager.GlobalGC - 8539 total (1 active), Execution time: mean = 685.499 ns, total = 5.853 ms, Queueing time: mean = 286.598 us, max = 54.395 ms, min = 1.566 us, total = 2.447 s
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6544 total (1 active), Execution time: mean = 40.975 us, total = 268.137 ms, Queueing time: mean = 1.122 ms, max = 280.977 ms, min = 2.629 us, total = 7.346 s
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6544 total (1 active), Execution time: mean = 1.436 ms, total = 9.395 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.PubsubCommandBatch.OnReplyReceived - 3058 total (0 active), Execution time: mean = 180.878 us, total = 553.125 ms, Queueing time: mean = 333.780 us, max = 314.878 ms, min = 3.847 us, total = 1.021 s
[state-dump] CoreWorkerService.grpc_client.PubsubCommandBatch - 3058 total (0 active), Execution time: mean = 1.292 ms, total = 3.950 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 2632 total (30 active), Execution time: mean = 8.745 us, total = 23.017 ms, Queueing time: mean = 761.395 ms, max = 40.485 s, min = 2.673 us, total = 2003.992 s
[state-dump] ClientConnection.async_read.ProcessMessage - 2602 total (0 active), Execution time: mean = 227.944 us, total = 593.110 ms, Queueing time: mean = 220.327 us, max = 305.922 ms, min = 1.923 us, total = 573.291 ms
[state-dump] RaySyncer.OnDemandBroadcasting - 2157 total (1 active), Execution time: mean = 120.681 us, total = 260.309 ms, Queueing time: mean = 11.201 ms, max = 327.811 ms, min = -0.001 s, total = 24.160 s
[state-dump] NodeManager.CheckGC - 2157 total (1 active), Execution time: mean = 181.869 us, total = 392.291 ms, Queueing time: mean = 11.140 ms, max = 327.810 ms, min = -0.001 s, total = 24.030 s
[state-dump] ObjectManager.UpdateAvailableMemory - 2157 total (0 active), Execution time: mean = 143.997 us, total = 310.602 ms, Queueing time: mean = 802.706 us, max = 95.055 ms, min = 1.531 us, total = 1.731 s
[state-dump] CoreWorkerService.grpc_client.PubsubLongPolling - 2144 total (1 active), Execution time: mean = 105.431 ms, total = 226.044 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.PubsubLongPolling.OnReplyReceived - 2143 total (0 active), Execution time: mean = 303.204 us, total = 649.765 ms, Queueing time: mean = 403.038 us, max = 294.286 ms, min = 3.293 us, total = 863.711 ms
[state-dump] Subscriber.HandlePublishedMessage_WORKER_OBJECT_LOCATIONS_CHANNEL - 1965 total (0 active), Execution time: mean = 28.264 us, total = 55.538 ms, Queueing time: mean = 398.147 us, max = 3.422 ms, min = 26.231 us, total = 782.359 ms
[state-dump] RaySyncer.BroadcastMessage - 1933 total (0 active), Execution time: mean = 188.863 us, total = 365.072 ms, Queueing time: mean = 1.919 us, max = 728.936 us, min = 164.000 ns, total = 3.710 ms
[state-dump] - 1933 total (0 active), Execution time: mean = 27.158 us, total = 52.497 ms, Queueing time: mean = 313.880 us, max = 47.176 ms, min = 2.263 us, total = 606.731 ms
[state-dump] CoreWorkerService.grpc_client.UpdateObjectLocationBatch.OnReplyReceived - 1881 total (0 active), Execution time: mean = 68.430 us, total = 128.717 ms, Queueing time: mean = 905.268 us, max = 315.011 ms, min = 2.957 us, total = 1.703 s
[state-dump] CoreWorkerService.grpc_client.UpdateObjectLocationBatch - 1881 total (0 active), Execution time: mean = 1.702 ms, total = 3.202 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ObjectManager.ObjectAdded - 1515 total (0 active), Execution time: mean = 599.431 us, total = 908.138 ms, Queueing time: mean = 1.234 ms, max = 288.828 ms, min = 5.604 us, total = 1.870 s
[state-dump] ObjectManager.ObjectDeleted - 1427 total (0 active), Execution time: mean = 86.146 us, total = 122.930 ms, Queueing time: mean = 1.090 ms, max = 40.857 ms, min = 4.481 us, total = 1.556 s
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1277 total (0 active), Execution time: mean = 181.013 us, total = 231.153 ms, Queueing time: mean = 704.484 us, max = 28.314 ms, min = 3.489 us, total = 899.626 ms
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 1277 total (17 active), Execution time: mean = 7.261 s, total = 9272.839 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] WorkerPool.PopWorkerCallback - 1260 total (0 active), Execution time: mean = 443.819 us, total = 559.212 ms, Queueing time: mean = 249.888 us, max = 21.548 ms, min = 8.531 us, total = 314.859 ms
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1256 total (0 active), Execution time: mean = 200.006 us, total = 251.207 ms, Queueing time: mean = 912.102 us, max = 294.327 ms, min = 4.262 us, total = 1.146 s
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 1256 total (0 active), Execution time: mean = 1.374 ms, total = 1.726 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1159 total (1 active), Execution time: mean = 21.299 us, total = 24.686 ms, Queueing time: mean = 7.262 ms, max = 228.515 ms, min = -0.000 s, total = 8.417 s
[state-dump] CoreWorkerService.grpc_client.GetCoreWorkerStats - 1051 total (1 active), Execution time: mean = 353.962 ms, total = 372.014 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.GetCoreWorkerStats.OnReplyReceived - 1050 total (0 active), Execution time: mean = 21.387 us, total = 22.456 ms, Queueing time: mean = 2.497 ms, max = 68.706 ms, min = 2.531 us, total = 2.622 s
[state-dump] NodeManagerService.grpc_server.PinObjectIDs - 860 total (0 active), Execution time: mean = 2.732 ms, total = 2.349 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManagerService.grpc_server.PinObjectIDs.HandleRequestImpl - 860 total (0 active), Execution time: mean = 1.187 ms, total = 1.021 s, Queueing time: mean = 1.276 ms, max = 253.244 ms, min = 3.190 us, total = 1.097 s
[state-dump] CoreWorkerService.grpc_client.LocalGC - 678 total (1 active), Execution time: mean = 546.409 ms, total = 370.465 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 677 total (0 active), Execution time: mean = 38.906 us, total = 26.340 ms, Queueing time: mean = 6.700 ms, max = 118.298 ms, min = 7.753 us, total = 4.536 s
[state-dump] CoreWorkerService.grpc_client.RestoreSpilledObjects.OnReplyReceived - 633 total (0 active), Execution time: mean = 217.989 us, total = 137.987 ms, Queueing time: mean = 1.953 ms, max = 288.106 ms, min = 5.128 us, total = 1.236 s
[state-dump] CoreWorkerService.grpc_client.RestoreSpilledObjects - 633 total (0 active), Execution time: mean = 411.531 ms, total = 260.499 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.SpillObjects - 495 total (1 active), Execution time: mean = 896.493 ms, total = 443.764 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] CoreWorkerService.grpc_client.SpillObjects.OnReplyReceived - 494 total (0 active), Execution time: mean = 1.022 ms, total = 505.089 ms, Queueing time: mean = 1.549 ms, max = 253.483 ms, min = 6.983 us, total = 765.321 ms
[state-dump] Subscriber.HandlePublishedMessage_WORKER_OBJECT_EVICTION - 440 total (0 active), Execution time: mean = 105.952 us, total = 46.619 ms, Queueing time: mean = 361.962 us, max = 6.662 ms, min = 63.738 us, total = 159.263 ms
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 238 total (1 active), Execution time: mean = 45.317 us, total = 10.785 ms, Queueing time: mean = 12.556 ms, max = 69.352 ms, min = -0.000 s, total = 2.988 s
[state-dump] NodeManager.ScheduleAndDispatchTasks - 238 total (1 active), Execution time: mean = 67.709 us, total = 16.115 ms, Queueing time: mean = 12.557 ms, max = 68.726 ms, min = -0.000 s, total = 2.988 s
[state-dump] NodeManager.deadline_timer.flush_free_objects - 237 total (1 active), Execution time: mean = 532.508 us, total = 126.204 ms, Queueing time: mean = 12.544 ms, max = 68.907 ms, min = -0.000 s, total = 2.973 s
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 236 total (0 active), Execution time: mean = 125.813 us, total = 29.692 ms, Queueing time: mean = 762.928 us, max = 26.054 ms, min = 3.744 us, total = 180.051 ms
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 236 total (0 active), Execution time: mean = 1.210 ms, total = 285.581 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 80 total (1 active), Execution time: mean = 10.205 us, total = 816.398 us, Queueing time: mean = 9.133 ms, max = 50.038 ms, min = -0.000 s, total = 730.670 ms
[state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 305.942 us, total = 14.685 ms, Queueing time: mean = 8.769 ms, max = 38.017 ms, min = 1.066 ms, total = 420.932 ms
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.943 ms, total = 93.286 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 431.747 us, total = 20.724 ms, Queueing time: mean = 8.663 ms, max = 37.872 ms, min = 518.322 us, total = 415.831 ms
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 37.073 us, total = 1.780 ms, Queueing time: mean = 599.873 us, max = 15.899 ms, min = 7.093 us, total = 28.794 ms
[state-dump] CoreWorkerService.grpc_client.DeleteSpilledObjects.OnReplyReceived - 42 total (0 active), Execution time: mean = 276.005 us, total = 11.592 ms, Queueing time: mean = 1.795 ms, max = 73.470 ms, min = 8.881 us, total = 75.380 ms
[state-dump] CoreWorkerService.grpc_client.DeleteSpilledObjects - 42 total (0 active), Execution time: mean = 196.171 ms, total = 8.239 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManagerService.grpc_server.GetNodeStats - 37 total (1 active), Execution time: mean = 8.765 s, total = 324.321 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManagerService.grpc_server.GetNodeStats.HandleRequestImpl - 37 total (0 active), Execution time: mean = 5.530 ms, total = 204.627 ms, Queueing time: mean = 1.003 ms, max = 14.421 ms, min = 7.057 us, total = 37.095 ms
[state-dump] ClientConnection.async_write.DoAsyncWrites - 32 total (0 active), Execution time: mean = 1.688 us, total = 54.003 us, Queueing time: mean = 111.398 us, max = 278.758 us, min = 55.195 us, total = 3.565 ms
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 30 total (0 active), Execution time: mean = 45.369 us, total = 1.361 ms, Queueing time: mean = 108.800 us, max = 1.117 ms, min = 10.800 us, total = 3.264 ms
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 30 total (0 active), Execution time: mean = 421.800 us, total = 12.654 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 4.164 ms, total = 99.941 ms, Queueing time: mean = 9.379 ms, max = 15.528 ms, min = 517.894 us, total = 225.103 ms
[state-dump] PeriodicalRunner.RunFnPeriodically - 12 total (0 active), Execution time: mean = 433.758 us, total = 5.205 ms, Queueing time: mean = 35.882 ms, max = 186.548 ms, min = 95.400 us, total = 430.589 ms
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 2.143 ms, total = 8.570 ms, Queueing time: mean = 5.806 ms, max = 14.542 ms, min = 742.547 us, total = 23.224 ms
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 1.290 s, total = 3.871 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 2 total (0 active), Execution time: mean = 107.071 us, total = 214.142 us, Queueing time: mean = 271.801 us, max = 322.534 us, min = 221.068 us, total = 543.602 us
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 283.927 us, total = 567.854 us, Queueing time: mean = 18.974 us, max = 22.825 us, min = 15.122 us, total = 37.947 us
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 637.200 us, total = 1.274 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 4.050 us, total = 8.100 us, Queueing time: mean = 1.200 us, max = 2.100 us, min = 300.000 ns, total = 2.400 us
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 198.750 us, total = 397.500 us, Queueing time: mean = 2.561 ms, max = 5.013 ms, min = 108.600 us, total = 5.122 ms
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 2 total (0 active), Execution time: mean = 2.054 ms, total = 4.107 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 2 total (0 active), Execution time: mean = 72.809 us, total = 145.619 us, Queueing time: mean = 322.529 us, max = 428.056 us, min = 217.003 us, total = 645.059 us
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.270 ms, total = 1.270 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 842.400 us, total = 842.400 us, Queueing time: mean = 53.300 us, max = 53.300 us, min = 53.300 us, total = 53.300 us
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.125 ms, total = 1.125 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 213.763 ms, total = 213.763 ms, Queueing time: mean = 35.800 us, max = 35.800 us, min = 35.800 us, total = 35.800 us
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 464.100 us, total = 464.100 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 445.000 us, total = 445.000 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 233.200 us, total = 233.200 us, Queueing time: mean = 17.800 us, max = 17.800 us, min = 17.800 us, total = 17.800 us
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 25.400 us, total = 25.400 us, Queueing time: mean = 16.300 us, max = 16.300 us, min = 16.300 us, total = 16.300 us
[state-dump] DebugString() time ms: 1
[2024-08-06 17:53:48,475 I 18208 17048] (raylet.exe) node_manager.cc:656: Sending Python GC request to 30 local workers to clean up Python cyclic references.
[2024-08-06 17:53:54,561 I 18208 17048] (raylet.exe) local_object_manager.cc:245: :info_message:Spilled 118563 MiB, 820 objects, write throughput 622 MiB/s.
[2024-08-06 17:53:54,565 I 18208 35500] (raylet.exe) dlmalloc.cc:288: fake_munmap(0000023FA4A90000, 4294967304)
[2024-08-06 17:53:54,612 I 18208 17048] (raylet.exe) local_resource_manager.cc:287: Object store memory is not idle.
[2024-08-06 17:53:54,761 I 18208 35500] (raylet.exe) dlmalloc.cc:288: fake_munmap(0000023EE4A80000, 3221225480)
[2024-08-06 17:53:54,932 I 18208 35500] (raylet.exe) dlmalloc.cc:288: fake_munmap(00000240A4AA0000, 4294967304)
[2024-08-06 17:53:55,124 I 18208 35500] (raylet.exe) dlmalloc.cc:288: fake_munmap(00000241A4AB0000, 8589934600)
[2024-08-06 17:53:57,310 I 18208 35500] (raylet.exe) object_lifecycle_manager.cc:206: Shared memory store full, falling back to allocating from filesystem: 2221201141
[2024-08-06 17:53:57,310 I 18208 35500] (raylet.exe) object_lifecycle_manager.cc:206: Shared memory store full, falling back to allocating from filesystem: 2221201141
[2024-08-06 17:53:57,678 C 18208 35500] (raylet.exe) dlmalloc.cc:129: Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450
*** StackTrace Information ***
Again the same error. It tried to spill into the filesystem. I checked the dashboard at for memory there was more then enough space (i set _memory to 250 gigs.
dashboard said 0/250