Using daily Ray build on Mac, manual cluster (1 head and 1 worker node) and tried to load data size 40M rows (ray.data.dataset). The data got loaded but throws the following errors
(DataLoadWorker pid=15003) [2021-11-24 17:20:05,789 E 15003 626393] core_worker.h:1110: Mismatched WorkerID: ignoring RPC for previous worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff, current worker ID: f56704bbf468f00ffacadf47bf7254d5fbd8d25a019e6d7c2a2331e5
(DataLoadWorker pid=15003) [2021-11-24 17:20:05,818 E 15003 626393] core_worker.h:1110: Mismatched WorkerID: ignoring RPC for previous worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff, current worker ID: f56704bbf468f00ffacadf47bf7254d5fbd8d25a019e6d7c2a2331e5
When I tried access the data, it throws the following exception
ERROR: ray::DataLoadWorker.get_pa_table() (pid=15004, ip=192.168.1.69, repr=<data_load_worker.DataLoadWorker object at 0x119848ed0>)
At least one of the input arguments for this task could not be computed:
ray.exceptions.OwnerDiedError: Failed to retrieve object ffffffffffffffffffffffffffffffffffffffff01000000b4000000. To see information about where this ObjectRef was created in Python, set the environment variable RAY_record_ref_creation_sites=1 during `ray start` and `ray.init()`.
It works if I run it as a local mode (single node) and also works for the smaller workloads (in cluster mode)
Please, can someone explain what does RAY_record_ref_creation_sites=1 mean? How do I fix this issues?