I have started cluster using cluster.yaml with 2 worker node. It’s running from last 10 days and suddenly today, I got the following error while starting new job. I am passing all required arguments while calling function and as I told, it was running for so many days. Another interesting thing is, jobs started after that jobs, were running perfectly. Can anyone suggest, what could be the issue here?
[Main] Error in start job:
Traceback (most recent call last):106 File "main.py", line 323, in <module>
proc_manager.start_counting_job()
File "main.py", line 170, in start_counting_job
self.subscribe_job(det_actor, job_id, det_holder=detection_holder, frame_holder=frame_holder)
File "main.py", line 43, in subscribe_job
tmp = actor.subscribe_job.remote(job)
File "/usr/local/lib/python3.8/dist-packages/ray/actor.py", line 138, in remote
return self._remote(args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/util/tracing/tracing_helper.py", line 425, in _start_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/actor.py", line 184, in _remote
return invocation(args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/actor.py", line 171, in invocation
return actor._actor_method_call(
File "/usr/local/lib/python3.8/dist-packages/ray/actor.py", line 1158, in _actor_method_call
list_args = signature.flatten_args(function_signature, args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/_private/signature.py", line 114, in flatten_args
raise TypeError(str(exc)) from None
TypeError: too many positional arguments