Ray crashed jupyter notebook

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

Hi

It’s my first experience with Ray, and the following example code crashed the jupyter notebook:

from ray import train, tune

def objective(config): # ①
score = config[“a”] ** 2 + config[“b”]
return {“score”: score}

search_space = { # ②
“a”: tune.grid_search([0.001, 0.01, 0.1, 1.0]),
“b”: tune.choice([1, 2, 3]),
}

tuner = tune.Tuner(objective, param_space=search_space) # ③

results = tuner.fit()
print(results.get_best_result(metric=“score”, mode=“min”).config)

2023-09-26 12:19:06,985 ERROR services.py:1169 – Failed to start the dashboard , return code 1
2023-09-26 12:19:06,987 ERROR services.py:1194 – Error should be written to ‘dashboard.log’ or ‘dashboard.err’. We are printing the last 20 lines for you. See ‘https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure’ to find where the log file is.
2023-09-26 12:19:07,005 ERROR services.py:1238 –
The last 20 lines of C:\Users\somas\AppData\Local\Temp\ray\session_2023-09-26_12-19-04_973268_35060\logs\dashboard.log (it contains the error message from the dashboard):
File “C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\ray\dashboard\head.py”, line 204, in load_modules
head_cls_list = dashboard_utils.get_all_modules(DashboardHeadModule)
File “C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\ray\dashboard\utils.py”, line 121, in get_all_modules
importlib.import_module(name)
File "C:\Users\somas\anaconda3\envs\nixtla\lib\importlib_init
.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1030, in _gcd_import
File “”, line 1007, in _find_and_load
File “”, line 986, in _find_and_load_unlocked
File “”, line 680, in load_unlocked
File “”, line 850, in exec_module
File “”, line 228, in call_with_frames_removed
File “C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\ray\dashboard\modules\job\cli.py”, line 14, in
from ray.job_submission import JobStatus, JobSubmissionClient
File "C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\ray\job_submission_init
.py", line 2, in
from ray.dashboard.modules.job.pydantic_models import DriverInfo, JobDetails, JobType
File “C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\ray\dashboard\modules\job\pydantic_models.py”, line 4, in
from pydantic import BaseModel, Field
ImportError: cannot import name ‘Field’ from ‘pydantic’ (C:\Users\somas\anaconda3\envs\nixtla\lib\site-packages\pydantic_init
.py)
2023-09-26 12:19:07,299 INFO worker.py:1553 – Started a local Ray instance.

The notebook shows the kernel to be busy, but it won’t respond to any input, and the only way forward is to kill the session. I checked out the docs.ray.io page as suggested by the error msg, and it turned a 404.

Environment:
Conda on Win11
Python 3.9.18
Ray 2.3 (core, default, tune)
jupyter notebook: 6.5.4
VPN disabled

If I run the same piece of code as a .py file, I get the same error, plus the following line.
[2023-09-26 13:03:39,252 E 39748 30984] core_worker.cc:191: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. Unknown error.

I haven’t seen any case/solution similar to this one after searching through the archive. I am at a loss on where to look next. Any suggestion is much appreciated!

Can you install the latest ray version 2.7 and try it again?

pip install -U "ray[train,tune,default]"

btw, I fixed the doc link issue

Hi Huaiwei

Thank you for your quick reply. I reinstalled Ray as suggested, and judging by the fact that the simple code just hanged and did nothing other than showing kernel busy, I believe the version 2.7 doesn’t quite solve the issue…

Best,
Stefan

Can you share the full logs?
You terminal output + /tmp/ray/session_*/logs (the address in Win might be different)

Hi Huaiwei

I found the directory at

~/AppData/Local/Temp/ray/session_2023-09-21_14-00-51_750035_2896/logs

And there are quite a few log files, which would be of most interest to you, the debug_state.txt debug_state_gcs.txt? The agent_15724.{err,out}, log_monitor.err, raylet.err, monitor.{err,out}, gcs_server.err are all 0 bytes, by the way.

Let me know what to present you.

Thx!
S

Can you just zip the entire logs folder and share?

Here you go.

Pls let me know once you downloaded it, so I can remove it.

Thanks Huaiwei!

I already downloaded it.

  1. So the current issue is that the tuning script hang?
  2. The version is not 2.7?

2023-09-21 14:00:53,848 INFO monitor.py:699 – Starting monitor using ray installation: C:\Users\somas\anaconda3\envs\ts\lib\site-packages\ray_init_.py
2023-09-21 14:00:53,848 INFO monitor.py:700 – Ray version: 2.6.3

Hum, that’s weird. I did a “ls -lt” and just grabbed and sent you the one with the latest time stamp without reading the detail file name. As it turned out, I have other logs from “09-26”. But when I grep -i “ray version” in the 09-26 directory, I got “Ray version: 2.3.0”… So newer date but older Ray version. And I thought I specifically pip installed 2.7 as you instructed.

Ok, it looks like I should just delete this conda virtual environment and start anew w/ Ray 2.7… I’ll report my new findings.

Thx,
S

Hi Huaiwei

I was able to run the simple example code to completion with both 2.6.3 and 2.7 although there is a minor synching-related warning message in the 2.6.3 output (see below). I’m using Ray with NeuralForecast, and I am encountering additional problems related to Ray there. But since I now know Ray is functioning in isolation, the downstream problem is most likely an issue of Neuralforecast.

Thank you for your support. I think we can close the case.

Best,
S

(neuralforecast) PS C:\Users\somas\MLDS\Kaggle\Rossmann\notebooks> python .\ray_test.py
2023-09-29 12:24:09,788 INFO worker.py:1612 – Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2023-09-29 12:24:13,066 INFO tune.py:226 – Initializing Ray automatically. For cluster usage or custom Ray initialization, call ray.init(...) before Tuner(...).
2023-09-29 12:24:13,068 INFO tune.py:666 – [output] This will use the new output engine with verbosity 1. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see Experimental features in Ray AIR · Issue #36949 · ray-project/ray · GitHub
╭──────────────────────────────────────────────────────────────────╮
│ Configuration for experiment objective_2023-09-29_12-24-05 │
├──────────────────────────────────────────────────────────────────┤
│ Search algorithm BasicVariantGenerator │
│ Scheduler FIFOScheduler │
│ Number of trials 4 │
╰──────────────────────────────────────────────────────────────────╯

View detailed results here: c://\Users\somas\ray_results\objective_2023-09-29_12-24-05
To visualize your results with TensorBoard, run: tensorboard --logdir C:\Users\somas\ray_results\objective_2023-09-29_12-24-05

Trial status: 4 PENDING
Current time: 2023-09-29 12:24:15. Total running time: 0s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs
╭────────────────────────────────────────────────╮
│ Trial name status b a │
├────────────────────────────────────────────────┤
│ objective_c5a4d_00000 PENDING 1 0.001 │
│ objective_c5a4d_00001 PENDING 3 0.01 │
│ objective_c5a4d_00002 PENDING 2 0.1 │
│ objective_c5a4d_00003 PENDING 2 1 │
╰────────────────────────────────────────────────╯

Trial objective_c5a4d_00000 started with configuration:
╭──────────────────────────────────────────────╮
│ Trial objective_c5a4d_00000 config │
├──────────────────────────────────────────────┤
│ a 0.001 │
│ b 1 │
╰──────────────────────────────────────────────╯

Trial objective_c5a4d_00000 completed after 1 iterations at 2023-09-29 12:24:18. Total running time: 3s
╭───────────────────────────────────────────╮
│ Trial objective_c5a4d_00000 result │
├───────────────────────────────────────────┤
│ time_this_iter_s 0 │
│ time_total_s 0 │
│ training_iteration 1 │
│ score 1 │
╰───────────────────────────────────────────╯

Trial objective_c5a4d_00002 started with configuration:
╭────────────────────────────────────────────╮
│ Trial objective_c5a4d_00002 config │
├────────────────────────────────────────────┤
│ a 0.1 │
│ b 2 │
╰────────────────────────────────────────────╯

Trial objective_c5a4d_00001 started with configuration:
╭─────────────────────────────────────────────╮
│ Trial objective_c5a4d_00001 config │
├─────────────────────────────────────────────┤
│ a 0.01 │
│ b 3 │
╰─────────────────────────────────────────────╯

Trial objective_c5a4d_00002 completed after 1 iterations at 2023-09-29 12:24:18. Total running time: 3s
╭─────────────────────────────────────────────╮
│ Trial objective_c5a4d_00002 result │
├─────────────────────────────────────────────┤
│ time_this_iter_s 0 │
│ time_total_s 0 │
│ training_iteration 1 │
│ score 2.01 │
╰─────────────────────────────────────────────╯

Trial objective_c5a4d_00001 completed after 1 iterations at 2023-09-29 12:24:18. Total running time: 3s
╭───────────────────────────────────────────────╮
│ Trial objective_c5a4d_00001 result │
├───────────────────────────────────────────────┤
│ time_this_iter_s 0 │
│ time_total_s 0 │
│ training_iteration 1 │
│ score 3.0001 │
╰───────────────────────────────────────────────╯

Trial objective_c5a4d_00003 started with configuration:
╭───────────────────────────────────────────╮
│ Trial objective_c5a4d_00003 config │
├───────────────────────────────────────────┤
│ a 1 │
│ b 2 │
╰───────────────────────────────────────────╯

Trial objective_c5a4d_00003 completed after 1 iterations at 2023-09-29 12:24:18. Total running time: 3s
╭───────────────────────────────────────────╮
│ Trial objective_c5a4d_00003 result │
├───────────────────────────────────────────┤
│ time_this_iter_s 0 │
│ time_total_s 0 │
│ training_iteration 1 │
│ score 3 │
╰───────────────────────────────────────────╯

2023-09-29 12:24:18,830 WARNING tune.py:1122 – Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path ‘C:/Users/somas/ray_results/objective_2023-09-29_12-24-05’, which is outside base dir ‘C:\Users\somas\ray_results\objective_2023-09-29_12-24-05’
Trial status: 4 TERMINATED
Current time: 2023-09-29 12:24:18. Total running time: 3s
Logical resource usage: 3.0/16 CPUs, 0/1 GPUs
╭────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status b a iter total time (s) score │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ objective_c5a4d_00000 TERMINATED 1 0.001 1 0 1 │
│ objective_c5a4d_00001 TERMINATED 3 0.01 1 0 3.0001 │
│ objective_c5a4d_00002 TERMINATED 2 0.1 1 0 2.01 │
│ objective_c5a4d_00003 TERMINATED 2 1 1 0 3 │
╰────────────────────────────────────────────────────────────────────────────────────────╯

{‘a’: 0.001, ‘b’: 1}
(neuralforecast) PS C:\Users\somas\MLDS\Kaggle\Rossmann\notebooks>