Running Tune within a remote function

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.


We are in the process of upgrading Ray to version 2.6. We are using Ray Tune in a K8s cluster.

We are facing issues when trying to run Ray Tune within a remote function. When running the code below (minimal example taken from the tutorials), the HEAD node of the cluster sysmatically crash.

If we use a standard function instead of remote, then it is working fine (autoscaling, etc.).

It used to work with 2.2.

import time

import ray
from ray import tune
from ray.air import session

def run():
    def evaluation_fn(step, width, height):
        return (0.1 + width * step / 100) ** (-1) + height * 0.1

    def easy_objective(config):
        width, height = config["width"], config["height"]

        for step in range(config["steps"]):
            intermediate_score = evaluation_fn(step, width, height)
                {"iterations": step, "mean_loss": intermediate_score}

    tuner = tune.Tuner(
            "steps": 50,
            "width": tune.uniform(0, 20),
            "height": tune.uniform(-100, 100),
            "activation": tune.grid_search(["relu", "tanh"]),
    results =
    return results.get_dataframe()

if __name__ == "__main__":
    res = ray.get(run.remote())

Hi @Cedric, do you have any logs from the failure? You’re seeing that the head node crashes and shuts down?

Have you also tried submitting the training task as a Ray Job?