Ray Train failing when running checkpointing demo

I am experimenting with Ray Train on my Mac. I am trying to run the checkpointing sample found here. And I repeatedly get the error below. What is odd is that if I restart my Mac, I can get one clean run, but after that, I get the error below until I restart my Mac again.

I am using Python 3.11.3 and Ray 2.8.1.

(TorchTrainer pid=5026) Starting distributed worker processes: [‘5053 (127.0.0.1)’, ‘5054 (127.0.0.1)’]
(RayTrainWorker pid=5053) Setting up process group for: env:// [rank=0, world_size=2]
(RayTrainWorker pid=5053) [/Users/runner/work/pytorch/pytorch/pytorch/third_party/gloo/gloo/transport/uv/libuv.h:596] uv_accept: invalid argument
(RayTrainWorker pid=5053) *** SIGABRT received at time=1703086775 ***
(RayTrainWorker pid=5053) PC: @ 0x18390e0dc (unknown) __pthread_kill
(RayTrainWorker pid=5053) @ 0x1038722a8 (unknown) absl::lts_20220623::WriteFailureInfo()
(RayTrainWorker pid=5053) @ 0x103871ff4 (unknown) absl::lts_20220623::AbslFailureSignalHandler()
(RayTrainWorker pid=5053) @ 0x183975a24 (unknown) _sigtramp
(RayTrainWorker pid=5053) @ 0x183945cc0 (unknown) pthread_kill
(RayTrainWorker pid=5053) @ 0x183851a40 (unknown) abort
(RayTrainWorker pid=5053) @ 0x166a5d4c8 (unknown) gloo::transport::uv::Device::listenCallback()
(RayTrainWorker pid=5053) @ 0x166a6e190 (unknown) gloo::transport::uv::libuv::Emitter<>::Handler<>::publish()
(RayTrainWorker pid=5053) @ 0x166ad5884 (unknown) uv__server_io
(RayTrainWorker pid=5053) @ 0x166ad9f18 (unknown) uv__io_poll
(RayTrainWorker pid=5053) @ 0x166acb5d8 (unknown) uv_run
(RayTrainWorker pid=5053) @ 0x166a5ff20 (unknown) std::__1::__thread_proxy<>()
(RayTrainWorker pid=5053) @ 0x183946034 (unknown) _pthread_start
(RayTrainWorker pid=5053) @ 0x183940e3c (unknown) thread_start
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,092 E 5053 97289] logging.cc:361: *** SIGABRT received at time=1703086775 ***
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: PC: @ 0x18390e0dc (unknown) __pthread_kill
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: @ 0x1038722a8 (unknown) absl::lts_20220623::WriteFailureInfo()
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: @ 0x10387200c (unknown) absl::lts_20220623::AbslFailureSignalHandler()
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: @ 0x183975a24 (unknown) _sigtramp
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: @ 0x183945cc0 (unknown) pthread_kill
(RayTrainWorker pid=5053) [2023-12-20 10:39:35,093 E 5053 97289] logging.cc:361: @ 0x183851a40 (unknown) abort

RuntimeError: Connection reset by peer
2023-12-20 10:39:35,470 WARNING experiment_state.py:327 – Experiment checkpoint syncing has been triggered multiple times in the last 30.0 seconds. A sync will be triggered whenever a trial has checkpointed more than num_to_keep times since last sync or if 300 seconds have passed since last sync. If you have set num_to_keep in your CheckpointConfig, consider increasing the checkpoint frequency or keeping more checkpoints. You can supress this warning by changing the TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S environment variable.
2023-12-20 10:39:35,473 ERROR tune.py:1043 – Trials did not complete: [TorchTrainer_f750f_00000]
2023-12-20 10:39:35,474 INFO tune.py:1047 – Total run time: 4.91 seconds (4.78 seconds for the tuning loop).