How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello,
We are in the process of upgrading Ray to version 2.6. We are using Ray Tune in a K8s cluster.
We are facing issues when trying to run Ray Tune within a remote function. When running the code below (minimal example taken from the tutorials), the HEAD node of the cluster sysmatically crash.
If we use a standard function instead of remote, then it is working fine (autoscaling, etc.).
It used to work with 2.2.
import time
import ray
from ray import tune
from ray.air import session
@ray.remote
def run():
def evaluation_fn(step, width, height):
time.sleep(0.1)
return (0.1 + width * step / 100) ** (-1) + height * 0.1
def easy_objective(config):
width, height = config["width"], config["height"]
for step in range(config["steps"]):
intermediate_score = evaluation_fn(step, width, height)
session.report(
{"iterations": step, "mean_loss": intermediate_score}
)
tuner = tune.Tuner(
easy_objective,
tune_config=tune.TuneConfig(
metric="mean_loss",
mode="min",
num_samples=50,
),
param_space={
"steps": 50,
"width": tune.uniform(0, 20),
"height": tune.uniform(-100, 100),
"activation": tune.grid_search(["relu", "tanh"]),
},
)
results = tuner.fit()
return results.get_dataframe()
if __name__ == "__main__":
ray.init("ray://test-kuberay-head-svc:10001")
res = ray.get(run.remote())