Hi @xwjiang2010, thanks for the response. I’d like to attach the console output, but it exceeds the character limit of this forum. Is there a workaround? Tried to upload, but I think only images are allowed. Here is a selection of the primary error I’m facing:
== Status ==
Current time: 2022-03-18 07:54:44 (running for 00:00:12.76)
Memory usage on this node: 5.5/124.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/58 CPUs, 1.0/2 GPUs, 0.0/148.4 GiB heap, 0.0/63.49 GiB objects
Result logdir: /home/ray/ray_results/tune_deeplearn
Number of trials: 1/1 (1 RUNNING)
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
| Trial name | status | loc | batch_size | initial_depth | lr | max_levels |
|--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------|
| train_deeplearn_tune_5105c_00000 | RUNNING | 100.125.228.77:471 | 32 | 40 | 0.000353205 | 5 |
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
(pid=runtime_env) 2022-03-18 07:54:44,443 INFO conda_utils.py:198 -- Installing collected packages: certifi, affine, zipp, typing-extensions, setuptools, pyparsing, numpy, attrs, snuggs, importlib-metadata, click, cligj, click-plugins, rasterio
(train_deeplearn_tune pid=471, ip=100.125.228.77) Using native 16bit precision.
(train_deeplearn_tune pid=471, ip=100.125.228.77) GPU available: True, used: True
(train_deeplearn_tune pid=471, ip=100.125.228.77) TPU available: False, using: 0 TPU cores
(train_deeplearn_tune pid=471, ip=100.125.228.77) IPU available: False, using: 0 IPUs
(train_deeplearn_tune pid=471, ip=100.125.228.77) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
== Status ==
Current time: 2022-03-18 07:54:48 (running for 00:00:16.76)
Memory usage on this node: 5.2/124.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/58 CPUs, 1.0/2 GPUs, 0.0/148.4 GiB heap, 0.0/63.49 GiB objects
Result logdir: /home/ray/ray_results/tune_deeplearn
Number of trials: 1/1 (1 RUNNING)
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
| Trial name | status | loc | batch_size | initial_depth | lr | max_levels |
|--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------|
| train_deeplearn_tune_5105c_00000 | RUNNING | 100.125.228.77:471 | 32 | 40 | 0.000353205 | 5 |
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
(pid=runtime_env) 2022-03-18 07:54:48,350 INFO conda_utils.py:198 -- ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
(pid=runtime_env) 2022-03-18 07:54:48,350 INFO conda_utils.py:198 -- raydp-nightly 2022.3.9.dev0 requires typing, which is not installed.
(pid=runtime_env) 2022-03-18 07:54:48,350 INFO conda_utils.py:198 -- xgboost-ray 0.1.4 requires numpy<1.20,>=1.16, but you have numpy 1.21.5 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,350 INFO conda_utils.py:198 -- tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.21.5 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,350 INFO conda_utils.py:198 -- tensorflow 2.6.0 requires typing-extensions~=3.7.4, but you have typing-extensions 4.1.1 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,351 INFO conda_utils.py:198 -- fastapi 0.75.0 requires starlette==0.17.1, but you have starlette 0.16.0 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,351 INFO conda_utils.py:198 -- autogluon-core 0.1.0 requires numpy==1.19.5, but you have numpy 1.21.5 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,351 INFO conda_utils.py:198 -- aiobotocore 1.2.2 requires botocore<1.19.53,>=1.19.52, but you have botocore 1.24.16 which is incompatible.
(pid=runtime_env) 2022-03-18 07:54:48,351 INFO conda_utils.py:198 -- Successfully installed affine-2.3.0 attrs-21.4.0 certifi-2021.10.8 click-8.0.4 click-plugins-1.1.1 cligj-0.7.2 importlib-metadata-4.11.3 numpy-1.21.5 pyparsing-3.0.7 rasterio-1.2.10 setuptools-59.5.0 snuggs-1.4.7 typing-extensions-4.1.1 zipp-3.7.0
(pid=runtime_env) 2022-03-18 07:54:48,351 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/zipp.py already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,352 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/rasterio.libs already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,352 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/snuggs-1.4.7.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,352 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/certifi-2021.10.8.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,353 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/_distutils_hack already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,353 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/attrs already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,353 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/importlib_metadata already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,354 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/pyparsing-3.0.7.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,354 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/certifi already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,354 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/snuggs already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,355 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/pyparsing already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,355 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/setuptools already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,355 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/zipp-3.7.0.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,355 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runt
ime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/distutils-precedence.pth already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,356 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/cligj already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,356 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/pkg_resources already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,356 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/click already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,357 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/rasterio-1.2.10.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,357 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/cligj-0.7.2.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,357 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/typing_extensions.py already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,358 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/numpy already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,358 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/rasterio already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,358 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/setuptools-59.5.0.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,359 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/click_plugins already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,359 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/click-8.0.4.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,359 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/typing_extensions-4.1.1.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,360 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/numpy.libs already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,360 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/click_plugins-1.1.1.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,360 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/affine-2.3.0.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,360 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/attrs-21.4.0.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,361 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/numpy-1.21.5.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,361 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/__pycache__ already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,361 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runt
ime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/attr already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,362 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/importlib_metadata-4.11.3.dist-info already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,362 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/affine already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,362 INFO conda_utils.py:198 -- WARNING: Target directory /tmp/ray/session_2022-03-18_07-01-59_616525_420/runtime_resources/pip/0e34da22171326a78f99e618ffb1059fcebc59d2/bin already exists. Specify --upgrade to force replacement.
(pid=runtime_env) 2022-03-18 07:54:48,658 INFO working_dir.py:98 -- Setup working dir for gcs://_ray_pkg_b83624756c441550.zip
(train_deeplearn_tune pid=471, ip=100.125.228.77)
(train_deeplearn_tune pid=471, ip=100.125.228.77) | Name | Type | Params
(train_deeplearn_tune pid=471, ip=100.125.228.77) ----------------------------------------
(train_deeplearn_tune pid=471, ip=100.125.228.77) 0 | train_acc | Accuracy | 0
(train_deeplearn_tune pid=471, ip=100.125.228.77) 1 | valid_acc | Accuracy | 0
(train_deeplearn_tune pid=471, ip=100.125.228.77) 2 | model | NewModel | 12.1 M
(train_deeplearn_tune pid=471, ip=100.125.228.77) ----------------------------------------
(train_deeplearn_tune pid=471, ip=100.125.228.77) 12.1 M Trainable params
(train_deeplearn_tune pid=471, ip=100.125.228.77) 0 Non-trainable params
(train_deeplearn_tune pid=471, ip=100.125.228.77) 12.1 M Total params
(train_deeplearn_tune pid=471, ip=100.125.228.77) 48.551 Total estimated model params size (MB)
== Status ==
Current time: 2022-03-18 07:54:53 (running for 00:00:21.77)
Memory usage on this node: 5.3/124.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/58 CPUs, 1.0/2 GPUs, 0.0/148.4 GiB heap, 0.0/63.49 GiB objects
Result logdir: /home/ray/ray_results/tune_deeplearn
Number of trials: 1/1 (1 RUNNING)
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
| Trial name | status | loc | batch_size | initial_depth | lr | max_levels |
|--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------|
| train_deeplearn_tune_5105c_00000 | RUNNING | 100.125.228.77:471 | 32 | 40 | 0.000353205 | 5 |
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
== Status ==
Current time: 2022-03-18 07:54:58 (running for 00:00:26.77)
Memory usage on this node: 5.4/124.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/58 CPUs, 1.0/2 GPUs, 0.0/148.4 GiB heap, 0.0/63.49 GiB objects
Result logdir: /home/ray/ray_results/tune_deeplearn
Number of trials: 1/1 (1 RUNNING)
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
| Trial name | status | loc | batch_size | initial_depth | lr | max_levels |
|--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------|
| train_deeplearn_tune_5105c_00000 | RUNNING | 100.125.228.77:471 | 32 | 40 | 0.000353205 | 5 |
+--------------------------------+----------+--------------------+--------------+-----------------+-------------+--------------+
2022-03-18 07:55:00,904 ERROR trial_runner.py:920 -- Trial train_deeplearn_tune_5105c_00000: Error processing event.
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 886, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 675, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1763, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=471, ip=100.125.228.77, repr=train_deeplearn_tune)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 319, in train
result = self.step()
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 381, in step
self._report_thread_runner_error(block=True)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 532, in _report_thread_runner_error
("Trial raised an exception. Traceback:\n{}".format(err_tb_str)
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=471, ip=100.125.228.77, repr=train_deeplearn_tune)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 262, in run
self._entrypoint()
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 331, in entrypoint
self._status_reporter.get_checkpoint())
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 600, in _trainable_func
output = fn()
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/utils/trainable.py", line 371, in inner
trainable(config, **fn_kwargs)
File "deeplearn/scripts/ray/tune_deeplearn.py", line 63, in train_deeplearn_tune
trainer.fit(deeplearn_model, datamodule=deeplearn_datamodule)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/utils/autologging_utils/safety.py", line 532, in safe_patch_function
patch_function(call_original, *args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/utils/autologging_utils/safety.py", line 242, in patch_with_managed_run
result = patch_function(original, *args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/pytorch/_pytorch_autolog.py", line 293, in patched_fit
result = original(self, *args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/utils/autologging_utils/safety.py", line 513, in call_original
return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/utils/autologging_utils/safety.py", line 456, in call_original_fn_with_event_logging
original_fn_result = original_fn(*og_args, **og_kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/mlflow/utils/autologging_utils/safety.py", line 510, in _original_fn
original_result = original(*_og_args, **_og_kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
self._run(model)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
self._dispatch()
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch
self.accelerator.start_training(self)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage
return self._run_train()
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in _run_train
self.fit_loop.run()
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
epoch_output = self.epoch_loop.run(train_dataloader)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 118, in advance
_, (batch, is_last) = next(dataloader_iter)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/profiler/base.py", line 104, in profile_iterable
value = next(iterator)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 672, in prefetch_iterator
for val in it:
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 589, in __next__
return self.request_next_batch(self.loader_iters)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 617, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next_fn)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 96, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 604, in next_fn
batch = next(iterator)
File "/home/ray/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/ray/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1183, in _next_data
return self._process_data(data)
File "/home/ray/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/ray/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
TypeError: __init__() takes 1 positional argument but 2 were given
I think something is going wrong either at the end of loading the first batch of data or at the start of loading the second batch of data. I’m using a Pytorch Lightning LightningDataModule
to load data. It streams data from S3 usingio.BytesIO
.