Getting Started with Ray does not work on any computer I try it

I am once again trying to get Ray working on a cluster I work on. I am trying to look at basic functionality and I can’t even get this “getting started” tutorial to run.

I’ve dumped a copy of the script I am running at the bottom of this post. The cluster uses slurm to provision resources, and I want to walk before I run, so I’m doing this on my MacBook

First, I make a fresh conda environment

conda create --name=raytune python=3.11
/path/to/conda/env/bin/pip install -U "ray[air]"
/path/to/conda/env/bin/pip install torch torchvision torchaudio

All of this goes off without a hitch. Python is version 3.11.5, ray is version 2.6.3, and torch is version 2.0.1.

The first time I ran it (python getting-started.py), I saw this:

CUDA is available in pytorch: False

<messages about downloading the MNIST data>

2023-09-12 14:59:11,572	INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
[2023-09-12 14:59:42,795 E 21473 4607747] core_worker.cc:201: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

The next time I ran it, I saw the following:

CUDA is available in pytorch: False
2023-09-12 15:16:12,792	ERROR node.py:605 -- Failed to connect to GCS. Please check `gcs_server.out` for more details.
2023-09-12 15:16:19,767	ERROR node.py:605 -- Failed to connect to GCS. Please check `gcs_server.out` for more details.
^CTraceback (most recent call last):
  File "python/ray/_raylet.pyx", line 2120, in ray._raylet._auto_reconnect.wrapper
  File "python/ray/_raylet.pyx", line 2185, in ray._raylet.GcsClient.internal_kv_get
  File "python/ray/_raylet.pyx", line 410, in ray._raylet.check_status
ray.exceptions.RpcError: failed to connect to all addresses

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/foshea/Documents/Projects/raytune/getting-started.py", line 130, in <module>
    results = tuner.fit()
              ^^^^^^^^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/tune/tuner.py", line 347, in fit
    return self._local_tuner.fit()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/tune/impl/tuner_internal.py", line 588, in fit
    analysis = self._fit_internal(trainable, param_space)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/tune/impl/tuner_internal.py", line 703, in _fit_internal
    analysis = run(
               ^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/tune/tune.py", line 573, in run
    _ray_auto_init(entrypoint=error_message_map["entrypoint"])
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/tune/tune.py", line 225, in _ray_auto_init
    ray.init()
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/worker.py", line 1514, in init
    _global_node = ray._private.node.Node(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/node.py", line 287, in __init__
    self.start_head_processes()
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/node.py", line 1160, in start_head_processes
    self.start_gcs_server()
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/node.py", line 992, in start_gcs_server
    self._init_gcs_client()
  File "/opt/anaconda3/envs/raytune/lib/python3.11/site-packages/ray/_private/node.py", line 605, in _init_gcs_client
    client.internal_kv_get(b"dummy", None)
  File "python/ray/_raylet.pyx", line 2140, in ray._raylet._auto_reconnect.wrapper
KeyboardInterrupt

What happens if I run this on a cluster (with a GPU)? This is the 3rd run, I got slightly different errors each time:

CUDA is available in pytorch: True
2023-09-12 14:34:40,342	INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
2023-09-12 14:34:49,564	INFO tune.py:226 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2023-09-12 14:34:49,579	INFO tune.py:666 -- [output] This will use the new output engine with verbosity 1. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
(pid=44135) [2023-09-12 14:34:50,301 E 44135 44390] logging.cc:97: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
[2023-09-12 14:34:50,447 E 42180 44126] logging.cc:97: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
╭────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment     train_mnist_2023-09-12_14-34-28   │
├────────────────────────────────────────────────────────────────────┤
│ Search algorithm                 BasicVariantGenerator             │
│ Scheduler                        FIFOScheduler                     │
│ Number of trials                 1                                 │
╰────────────────────────────────────────────────────────────────────╯

It seems like the tutorial is missing some critical step to getting ray to work. Any suggestions?


The script:

# https://docs.ray.io/en/latest/tune/getting-started.html

import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F

from ray import air, tune
from ray.air import session, RunConfig
from ray.tune.search import ConcurrencyLimiter
from ray.tune.schedulers import ASHAScheduler


DATA_DIR = '/Users/foshea/Documents/Projects/raytune/data'
STORAGE_DIR = '/Users/foshea/Documents/Projects/raytune/ray_results'

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # In this example, we don't change the model architecture
			# due to simplicity.
        self.conv1 = nn.Conv2d(1, 3, kernel_size=3)
        self.fc = nn.Linear(192, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 3))
        x = x.view(-1, 192)
        x = self.fc(x)
        return F.log_softmax(x, dim=1)



# Change these values if you want the training to run quicker or slower.
EPOCH_SIZE = 512
TEST_SIZE = 256

def train(model, optimizer, train_loader):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # We set this just for the example to run quickly.
        if batch_idx * len(data) > EPOCH_SIZE:
            return
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()


def test(model, data_loader):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(data_loader):
            # We set this just for the example to run quickly.
            if batch_idx * len(data) > TEST_SIZE:
                break
            data, target = data.to(device), target.to(device)
            outputs = model(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()

    return correct / total


def train_mnist(config):
    # Data Setup
    mnist_transforms = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307, ), (0.3081, ))])

    train_loader = DataLoader(
        datasets.MNIST(DATA_DIR, train=True, download=True, transform=mnist_transforms),
        batch_size=64,
        shuffle=True)
    test_loader = DataLoader(
        datasets.MNIST(DATA_DIR, train=False, transform=mnist_transforms),
        batch_size=64,
        shuffle=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = ConvNet()
    model.to(device)

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)

        # Send the current training result back to Tune
        session.report({"mean_accuracy": acc})

        if i % 5 == 0:
            # This saves the model to the trial directory
            torch.save(model.state_dict(), "./model.pth")


if __name__ == "__main__":

    search_space = {
        "lr": tune.sample_from(lambda spec: 10 ** (-10 * np.random.rand())),
        "momentum": tune.uniform(0.1, 0.9),
    }

    print('CUDA is available in pytorch:', torch.cuda.is_available())

    # Uncomment this to enable distributed execution
    # `ray.init(address="auto")`

    # Download the dataset first
    datasets.MNIST(DATA_DIR, train=True, download=True)

    tuner = tune.Tuner(
        train_mnist,
        param_space=search_space,
        run_config=RunConfig(storage_path=STORAGE_DIR),
        # tune_config=tune.TuneConfig(max_concurrent_trials=1)
    )

    results = tuner.fit()

    dfs = {result.log_dir: result.metrics_dataframe for result in results}
    [d.mean_accuracy.plot() for d in dfs.values()]

I have tried to do this using python 3.10, because apparently support for 3.11 is experimental. No change.

In addition, I have tried a different quick start guide here.

Full script:

# https://docs.ray.io/en/latest/tune/tutorials/tune-run.html

from ray import tune
import ray
import os

NUM_MODELS = 100

def train_model(config):
    score = config["model_id"]

    # Import model libraries, etc...
    # Load data and train model code here...

    # Return final stats. You can also return intermediate progress
    # using ray.air.session.report() if needed.
    # To return your model, you could write it to storage and return its
    # URI in this dict, or return it as a Tune Checkpoint:
    # https://docs.ray.io/en/latest/tune/tutorials/tune-checkpoints.html
    return {"score": score}

# Define trial parameters as a single grid sweep.
trial_space = {
    # This is an example parameter. You could replace it with filesystem paths,
    # model types, or even full nested Python dicts of model configurations, etc.,
    # that enumerate the set of trials to run.
    "model_id": tune.grid_search([
        "model_{}".format(i)
        for i in range(NUM_MODELS)
    ])
}

# Can customize resources per trial, here we set 1 CPU each.
train_model = tune.with_resources(train_model, {"cpu": 1})

# Start a Tune run and print the best result.
tuner = tune.Tuner(train_model, param_space=trial_space)
results = tuner.fit()

# Access individual results.
print(results[0])
print(results[1])
print(results[2])

This script returns a similar error:

2023-09-12 16:42:10,868	ERROR node.py:605 -- Failed to connect to GCS. Please check `gcs_server.out` for more details.

Can you see any more errors here?

I think one thing to try is a simple Ray Core script, which might help isolate the problem. Could you try the examples in the Ray Core Quickstart section and see if the error is reproducible?

I’ve already tried the smallest example I can find:

# https://docs.ray.io/en/latest/ray-overview/getting-started.html
# Tune: Hyperparameter Tuning at Scale

from ray import tune


def objective(config):  # ①
    score = config["a"] ** 2 + config["b"]
    return {"score": score}


search_space = {  # ②
    "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
    "b": tune.choice([1, 2, 3]),
}

tuner = tune.Tuner(objective, param_space=search_space)  # ③

results = tuner.fit()
print(results.get_best_result(metric="score", mode="min").config)

Same results as above.

Edit: I forgot to answer this part:

Can you see any more errors here?

No, the error you quote repeats every 20-30 seconds until I ctrl-C to stop the thing from running. I usually only wait a minute.

I tried a cache-free install on the cluster I use:

conda create --name=raytune2 python=3.10
/path/to/conda/env/bin/pip install -U "ray[air]"==2.5.1 --no-cache-dir

I then ran the example in my last post and saw this:

2023-09-13 12:56:50,516	ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2023-09-13 12:56:50,516	ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-09-13 12:56:50,516	ERROR services.py:1276 -- 
The last 20 lines of /tmp/ray/session_2023-09-13_12-56-47_130420_81132/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module>
    from ray.util.state.common import (
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module>
    from ray.util.state.api import (
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module>
    from ray.util.state.common import (
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module>
    @dataclass(init=True)
  File "/sdf/group/ml/bes_anomalies/conda/envs/raytune2/lib/python3.10/site-packages/pydantic/dataclasses.py", line 141, in dataclass
    assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
AssertionError: pydantic.dataclasses.dataclass only supports init=False
2023-09-13 12:56:50,741	INFO worker.py:1636 -- Started a local Ray instance.
[2023-09-13 12:56:58,709 E 81132 81132] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

When I try to install the latest version (2.6.3) the same way as above, I get the following when I try to run the small example:

2023-09-13 13:04:18,198	INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2023-09-13 13:04:32,319	INFO tune.py:226 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2023-09-13 13:04:32,322	INFO tune.py:666 -- [output] This will use the new output engine with verbosity 1. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
╭──────────────────────────────────────────────────────────────────╮
│ Configuration for experiment     objective_2023-09-13_13-04-13   │
├──────────────────────────────────────────────────────────────────┤
│ Search algorithm                 BasicVariantGenerator           │
│ Scheduler                        FIFOScheduler                   │
│ Number of trials                 4                               │
╰──────────────────────────────────────────────────────────────────╯

View detailed results here: /sdf/home/f/foshea/ray_results/objective_2023-09-13_13-04-13
To visualize your results with TensorBoard, run: `tensorboard --logdir /sdf/home/f/foshea/ray_results/objective_2023-09-13_13-04-13`

Trial status: 4 PENDING
Current time: 2023-09-13 13:04:32. Total running time: 0s
Logical resource usage: 0/128 CPUs, 0/4 GPUs (0.0/1.0 accelerator_type:A100)
╭────────────────────────────────────────────────╮
│ Trial name              status       b       a │
├────────────────────────────────────────────────┤
│ objective_c1063_00000   PENDING      3   0.001 │
│ objective_c1063_00001   PENDING      2   0.01  │
│ objective_c1063_00002   PENDING      1   0.1   │
│ objective_c1063_00003   PENDING      3   1     │
╰────────────────────────────────────────────────╯

[2023-09-13 13:04:33,036 E 83474 83879] logging.cc:97: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
(bundle_reservation_check_func pid=84000) <jemalloc>: arena 0 background thread creation failed (11)
(bundle_reservation_check_func pid=84000) [2023-09-13 13:04:33,040 E 84000 86404] logging.cc:97: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
(pid=83887) [2023-09-13 13:04:33,219 E 83887 84178] logging.cc:104: Stack trace: 
(pid=83887)  /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xe4bc3a) [0x7f05f1fb8c3a] ray::operator<<()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xe4e3f8) [0x7f05f1fbb3f8] ray::TerminateHandler()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f05f0de735a] __cxxabiv1::__terminate()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f05f0de73c5]
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb1658) [0x7f05f0de7658]
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x4eec12) [0x7f05f165bc12] boost::throw_exception<>()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3ac4b) [0x7f05f20a7c4b] boost::asio::detail::do_throw_error()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3b66b) [0x7f05f20a866b] boost::asio::detail::posix_thread::start_thread()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3bacc) [0x7f05f20a8acc] boost::asio::thread_pool::thread_pool()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x946844) [0x7f05f1ab3844] ray::rpc::(anonymous namespace)::_GetServerCallExecutor()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc21GetServerCallExecutorEv+0x9) [0x7f05f1ab38d9] ray::rpc::GetServerCallExecutor()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvN3ray6StatusESt8functionIFvvEES4_EZNS0_3rpc14ServerCallImplINS6_24CoreWorkerServiceHandlerENS6_25GetCoreWorkerStatsRequestENS6_23GetCoreWorkerStatsReplyEE17HandleRequestImplEvEUlS1_S4_S4_E_E9_M_invokeERKSt9_Any_dataOS1_OS4_SI_+0x128) [0x7f05f1816bf8] std::_Function_handler<>::_M_invoke()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker24HandleGetCoreWorkerStatsENS_3rpc25GetCoreWorkerStatsRequestEPNS2_23GetCoreWorkerStatsReplyESt8functionIFvNS_6StatusES6_IFvvEES9_EE+0x8f1) [0x7f05f1851051] ray::core::CoreWorker::HandleGetCoreWorkerStats()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ServerCallImplINS0_24CoreWorkerServiceHandlerENS0_25GetCoreWorkerStatsRequestENS0_23GetCoreWorkerStatsReplyEE17HandleRequestImplEv+0x112) [0x7f05f1847dd2] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x9e9706) [0x7f05f1b56706] EventTracker::RecordExecution()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x98661e) [0x7f05f1af361e] std::_Function_handler<>::_M_invoke()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x986b76) [0x7f05f1af3b76] boost::asio::detail::completion_handler<>::do_complete()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf383db) [0x7f05f20a53db] boost::asio::detail::scheduler::do_run_one()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf39ea9) [0x7f05f20a6ea9] boost::asio::detail::scheduler::run()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3a362) [0x7f05f20a7362] boost::asio::io_context::run()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker12RunIOServiceEv+0xcd) [0x7f05f185e0ed] ray::core::CoreWorker::RunIOService()
(pid=83887) /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf6af80) [0x7f05f20d7f80] execute_native_thread_routine
(pid=83887) /lib64/libpthread.so.0(+0x7ea5) [0x7f05f9979ea5] start_thread
(pid=83887) /lib64/libc.so.6(clone+0x6d) [0x7f05f8f99b0d] clone
(pid=83887) 
(pid=83887) *** SIGABRT received at time=1694635473 on cpu 66 ***
(pid=83887) PC: @     0x7f05f8ed1387  (unknown)  raise
(pid=83887)     @     0x7f05f9981630       1920  (unknown)
(pid=83887)     @     0x7f05f0de735a  (unknown)  __cxxabiv1::__terminate()
(pid=83887)     @     0x7f05f0de7580  (unknown)  (unknown)
(pid=83887) [2023-09-13 13:04:33,220 E 83887 84178] logging.cc:361: *** SIGABRT received at time=1694635473 on cpu 66 ***
(pid=83887) [2023-09-13 13:04:33,220 E 83887 84178] logging.cc:361: PC: @     0x7f05f8ed1387  (unknown)  raise
(pid=83887) [2023-09-13 13:04:33,220 E 83887 84178] logging.cc:361:     @     0x7f05f9981630       1920  (unknown)
(pid=83887) [2023-09-13 13:04:33,220 E 83887 84178] logging.cc:361:     @     0x7f05f0de735a  (unknown)  __cxxabiv1::__terminate()
(pid=83887) [2023-09-13 13:04:33,220 E 83887 84178] logging.cc:361:     @     0x7f05f0de7580  (unknown)  (unknown)
(pid=83887) Fatal Python error: Aborted
(pid=83887) 
(pid=83887) 
(pid=83887) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, ray._raylet, charset_normalizer.md (total: 8)
[2023-09-13 13:04:33,306 E 83474 83879] logging.cc:104: Stack trace: 
 /sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xe4bc3a) [0x7f684ed65c3a] ray::operator<<()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xe4e3f8) [0x7f684ed683f8] ray::TerminateHandler()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f684dbaf35a] __cxxabiv1::__terminate()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f684dbaf3c5]
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/bin/../lib/libstdc++.so.6(+0xb1658) [0x7f684dbaf658]
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x4eec12) [0x7f684e408c12] boost::throw_exception<>()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3ac4b) [0x7f684ee54c4b] boost::asio::detail::do_throw_error()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3b66b) [0x7f684ee5566b] boost::asio::detail::posix_thread::start_thread()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3bacc) [0x7f684ee55acc] boost::asio::thread_pool::thread_pool()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x946844) [0x7f684e860844] ray::rpc::(anonymous namespace)::_GetServerCallExecutor()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc21GetServerCallExecutorEv+0x9) [0x7f684e8608d9] ray::rpc::GetServerCallExecutor()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvN3ray6StatusESt8functionIFvvEES4_EZNS0_3rpc14ServerCallImplINS6_24CoreWorkerServiceHandlerENS6_25GetCoreWorkerStatsRequestENS6_23GetCoreWorkerStatsReplyEE17HandleRequestImplEvEUlS1_S4_S4_E_E9_M_invokeERKSt9_Any_dataOS1_OS4_SI_+0x128) [0x7f684e5c3bf8] std::_Function_handler<>::_M_invoke()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker24HandleGetCoreWorkerStatsENS_3rpc25GetCoreWorkerStatsRequestEPNS2_23GetCoreWorkerStatsReplyESt8functionIFvNS_6StatusES6_IFvvEES9_EE+0x8f1) [0x7f684e5fe051] ray::core::CoreWorker::HandleGetCoreWorkerStats()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ServerCallImplINS0_24CoreWorkerServiceHandlerENS0_25GetCoreWorkerStatsRequestENS0_23GetCoreWorkerStatsReplyEE17HandleRequestImplEv+0x112) [0x7f684e5f4dd2] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x9e9706) [0x7f684e903706] EventTracker::RecordExecution()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x98661e) [0x7f684e8a061e] std::_Function_handler<>::_M_invoke()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0x986b76) [0x7f684e8a0b76] boost::asio::detail::completion_handler<>::do_complete()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf383db) [0x7f684ee523db] boost::asio::detail::scheduler::do_run_one()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf39ea9) [0x7f684ee53ea9] boost::asio::detail::scheduler::run()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf3a362) [0x7f684ee54362] boost::asio::io_context::run()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker12RunIOServiceEv+0xcd) [0x7f684e60b0ed] ray::core::CoreWorker::RunIOService()
/sdf/group/ml/bes_anomalies/conda/envs/raytune3/lib/python3.10/site-packages/ray/_raylet.so(+0xf6af80) [0x7f684ee84f80] execute_native_thread_routine
/lib64/libpthread.so.0(+0x7ea5) [0x7f6856706ea5] start_thread
/lib64/libc.so.6(clone+0x6d) [0x7f6855d26b0d] clone

*** SIGABRT received at time=1694635473 on cpu 11 ***
PC: @     0x7f6855c5e387  (unknown)  raise
    @     0x7f685670e630       1920  (unknown)
    @     0x7f684dbaf35a  (unknown)  __cxxabiv1::__terminate()
    @     0x7f684dbaf580  (unknown)  (unknown)
[2023-09-13 13:04:33,307 E 83474 83879] logging.cc:361: *** SIGABRT received at time=1694635473 on cpu 11 ***
[2023-09-13 13:04:33,307 E 83474 83879] logging.cc:361: PC: @     0x7f6855c5e387  (unknown)  raise
[2023-09-13 13:04:33,307 E 83474 83879] logging.cc:361:     @     0x7f685670e630       1920  (unknown)
[2023-09-13 13:04:33,307 E 83474 83879] logging.cc:361:     @     0x7f684dbaf35a  (unknown)  __cxxabiv1::__terminate()
[2023-09-13 13:04:33,307 E 83474 83879] logging.cc:361:     @     0x7f684dbaf580  (unknown)  (unknown)
Fatal Python error: Aborted


Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, ray._raylet, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pyarrow._hdfsio, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, charset_normalizer.md, grpc._cython.cygrpc, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, pyarrow._json (total: 99)
Aborted

When I try to install ray using conda-forge, I can’t even import ray.