Cartpole_ltsm.py example fails

Hello, I have a docker container and I installed nightly build 2.0.0.dev0 in it.
Simply running cartpole_lstm.py with torch framework fails with the following errors:

(pid=2263) 2021-10-15 16:23:24,342	ERROR worker.py:425 -- SystemExit was raised from the worker
(pid=2263) Traceback (most recent call last):
(pid=2263)   File "python/ray/_raylet.pyx", line 558, in ray._raylet.execute_task
(pid=2263)   File "python/ray/_raylet.pyx", line 565, in ray._raylet.execute_task
(pid=2263)   File "python/ray/_raylet.pyx", line 569, in ray._raylet.execute_task
(pid=2263)   File "python/ray/_raylet.pyx", line 519, in ray._raylet.execute_task.function_executor
(pid=2263)   File "/opt/conda/lib/python3.9/site-packages/ray/_private/function_manager.py", line 576, in actor_method_executor
(pid=2263)     return method(__ray_actor, *args, **kwargs)
(pid=2263)   File "/opt/conda/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
(pid=2263)     return method(self, *_args, **_kwargs)
(pid=2263)   File "/opt/conda/lib/python3.9/site-packages/ray/actor.py", line 1047, in __ray_terminate__
(pid=2263)     ray.actor.exit_actor()
(pid=2263)   File "/opt/conda/lib/python3.9/site-packages/ray/actor.py", line 1123, in exit_actor
(pid=2263)     raise exit
(pid=2263) SystemExit: 0
(pid=2263) 
(pid=2263) During handling of the above exception, another exception occurred:
(pid=2263) 
(pid=2263) Traceback (most recent call last):
(pid=2263)   File "/opt/conda/lib/python3.9/linecache.py", line 93, in updatecache
(pid=2263)     stat = os.stat(fullname)
(pid=2263) FileNotFoundError: [Errno 2] No such file or directory: 'python/ray/_raylet.pyx'
(pid=2263) 
(pid=2263) During handling of the above exception, another exception occurred:
(pid=2263) 
(pid=2263) Traceback (most recent call last):
(pid=2263)   File "python/ray/_raylet.pyx", line 692, in ray._raylet.task_execution_handler
(pid=2263)   File "python/ray/_raylet.pyx", line 521, in ray._raylet.execute_task
(pid=2263)   File "python/ray/includes/libcoreworker.pxi", line 33, in ray._raylet.ProfileEvent.__exit__
(pid=2263)   File "/opt/conda/lib/python3.9/traceback.py", line 167, in format_exc
(pid=2263)     return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
(pid=2263)   File "/opt/conda/lib/python3.9/traceback.py", line 120, in format_exception
(pid=2263)     return list(TracebackException(
(pid=2263)   File "/opt/conda/lib/python3.9/traceback.py", line 508, in __init__
(pid=2263)     self.stack = StackSummary.extract(
(pid=2263)   File "/opt/conda/lib/python3.9/traceback.py", line 366, in extract
(pid=2263)     f.line
(pid=2263)   File "/opt/conda/lib/python3.9/traceback.py", line 288, in line
(pid=2263)     self._line = linecache.getline(self.filename, self.lineno).strip()
(pid=2263)   File "/opt/conda/lib/python3.9/linecache.py", line 30, in getline
(pid=2263)     lines = getlines(filename, module_globals)
(pid=2263)   File "/opt/conda/lib/python3.9/linecache.py", line 46, in getlines
(pid=2263)     return updatecache(filename, module_globals)
(pid=2263)   File "/opt/conda/lib/python3.9/linecache.py", line 99, in updatecache
(pid=2263)     if lazycache(filename, module_globals):
(pid=2263)   File "/opt/conda/lib/python3.9/linecache.py", line 161, in lazycache
(pid=2263)     if len(cache[filename]) == 1:
(pid=2263)   File "/opt/conda/lib/python3.9/site-packages/ray/worker.py", line 422, in sigterm_handler
(pid=2263)     sys.exit(1)
(pid=2263) SystemExit: 1
(pid=2263) [2021-10-15 16:23:24,345 E 2263 2503] raylet_client.cc:159: IOError: Broken pipe [RayletClient] Failed to disconnect from raylet.
(pid=2262) 2021-10-15 16:23:24,342	ERROR worker.py:425 -- SystemExit was raised from the worker
(pid=2262) Traceback (most recent call last):
(pid=2262)   File "python/ray/_raylet.pyx", line 558, in ray._raylet.execute_task
(pid=2262)   File "python/ray/_raylet.pyx", line 565, in ray._raylet.execute_task
(pid=2262)   File "python/ray/_raylet.pyx", line 569, in ray._raylet.execute_task
(pid=2262)   File "python/ray/_raylet.pyx", line 519, in ray._raylet.execute_task.function_executor
(pid=2262)   File "/opt/conda/lib/python3.9/site-packages/ray/_private/function_manager.py", line 576, in actor_method_executor
(pid=2262)     return method(__ray_actor, *args, **kwargs)
(pid=2262)   File "/opt/conda/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
(pid=2262)     return method(self, *_args, **_kwargs)
(pid=2262)   File "/opt/conda/lib/python3.9/site-packages/ray/actor.py", line 1047, in __ray_terminate__
(pid=2262)     ray.actor.exit_actor()
(pid=2262)   File "/opt/conda/lib/python3.9/site-packages/ray/actor.py", line 1123, in exit_actor
(pid=2262)     raise exit
(pid=2262) SystemExit: 0
(pid=2262) 
(pid=2262) During handling of the above exception, another exception occurred:
(pid=2262) 
(pid=2262) Traceback (most recent call last):
(pid=2262)   File "python/ray/_raylet.pyx", line 692, in ray._raylet.task_execution_handler
(pid=2262)   File "python/ray/_raylet.pyx", line 521, in ray._raylet.execute_task
(pid=2262)   File "python/ray/includes/libcoreworker.pxi", line 33, in ray._raylet.ProfileEvent.__exit__
(pid=2262)   File "/opt/conda/lib/python3.9/traceback.py", line 167, in format_exc
(pid=2262)     return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
(pid=2262)   File "/opt/conda/lib/python3.9/traceback.py", line 120, in format_exception
(pid=2262)     return list(TracebackException(
(pid=2262)   File "/opt/conda/lib/python3.9/traceback.py", line 508, in __init__
(pid=2262)     self.stack = StackSummary.extract(
(pid=2262)   File "/opt/conda/lib/python3.9/traceback.py", line 321, in extract
(pid=2262)     @classmethod
(pid=2262)   File "/opt/conda/lib/python3.9/site-packages/ray/worker.py", line 422, in sigterm_handler
(pid=2262)     sys.exit(1)
(pid=2262) SystemExit: 1
(pid=2262) [2021-10-15 16:23:24,347 E 2262 2495] raylet_client.cc:159: IOError: Broken pipe [RayletClient] Failed to disconnect from raylet.
2021-10-15 16:23:24,449	INFO tune.py:630 -- Total run time: 99.07 seconds (98.47 seconds for the tuning loop).

Does anyone know the reason?

Hi @mg64ve,

The training actually finished successfully. It looks like the error comes from trying to shut ray down. Maybe post your question in the “core” category. Or try uninstalling and reinstalling ray.

ok thanks @mannyv I will write.
However if instead of using ray.tune I use manual training (commenting out the part that is commented) I am getting the following error:

2021-10-16 15:40:49,343	WARNING services.py:1756 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2021-10-16 15:40:50,457	INFO logger.py:180 -- pip install 'ray[tune]' to see TensorBoard files.
2021-10-16 15:40:50,458	WARNING logger.py:325 -- Could not instantiate TBXLogger: No module named 'tensorboardX'.
2021-10-16 15:40:50,460	INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
2021-10-16 15:40:50,461	INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2021-10-16 15:40:51,967	WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
2021-10-16 15:40:51,979	WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
2021-10-16 15:40:51,980	WARNING util.py:57 -- Install gputil for GPU system monitoring.
Traceback (most recent call last):
  File "/srv/local_projects/docker/ray/examples/lstm/cartpole_lstm.py", line 138, in <module>
    a, state_out, _ = trainer.compute_single_action(obs, state, prev_a, prev_r)
TypeError: compute_single_action() takes from 1 to 3 positional arguments but 5 were given
(pid=835) 2021-10-16 15:40:51,930	WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!

I now I have two docker containers, one with ray 1.7.0 and one with ray 2.0.0.dev0 and both they give me same error.