Setup Ray cluster with one CPU head node and one GPU workder node.
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py”, line 135, in start
ray.get(refs, timeout=timeout)
File “/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py”, line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py”, line 103, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py”, line 2659, in get
2024-08-14 01:10:03 INFO [BOLTRAY] app_pid: 792 pid: 792 tid:140112741586752 f:_monitor_cluster:122 m:Job v5_distillation_pipeline exited in state FAILED
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File “/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py”, line 848, in get_objects
data_metadata_pairs = self.core_worker.get_objects(
File “python/ray/_raylet.pyx”, line 3510, in ray._raylet.CoreWorker.get_objects
File “python/ray/_raylet.pyx”, line 576, in ray._raylet.check_status
ray.exceptions.GetTimeoutError: Get timed out: some object(s) not ready.