Hi @kai ,
Thanks a lot for the quick reply. I created a minimum reproducing example but turns out the problem is depending on the machine on which I run the script.
with ray
import os
import time
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
from ray import tune
from ray.air import RunConfig
from ray.tune import CLIReporter
def trainable(parameter_space):
os.environ["OMP_NUM_THREADS"] = '56'
print(os.environ["OMP_NUM_THREADS"])
k_neighbors = parameter_space["k_neighbors"]
X, y = make_classification(
n_classes=2,
weights=[0.4, 0.6],
n_features=100,
n_samples=100_000,
random_state=10)
sm = SMOTE(k_neighbors=k_neighbors)
start = time.time()
_, _ = sm.fit_resample(X, y)
print('------ Time ------:', (time.time() - start))
trainable = tune.with_resources(trainable, {"cpu": 56, "gpu": 6})
reporter = CLIReporter(max_report_frequency=300)
tuner = tune.Tuner(
trainable,
param_space={"k_neighbors": 10},
run_config=RunConfig(progress_reporter=reporter)
)
results = tuner.fit()
without ray
import os
import time
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
k_neighbors = 10
X, y = make_classification(
n_classes=2,
weights=[0.4, 0.6],
n_features=100,
n_samples=100_000,
random_state=10)
sm = SMOTE(k_neighbors=k_neighbors)
start = time.time()
_, _ = sm.fit_resample(X, y)
print('Time:', (time.time() - start))
This is a CentOS server with 56 CPUs and 6 GPUs.
Running the script without ray returns
Time: 3.4180803298950195
Running the script with ray returns
2023-01-27 22:15:28,785 INFO worker.py:1538 -- Started a local Ray instance.
== Status ==
Current time: 2023-01-27 22:15:34 (running for 00:00:02.43)
Memory usage on this node: 22.9/503.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 56.0/56 CPUs, 6.0/6 GPUs, 0.0/208.61 GiB heap, 0.0/93.39 GiB objects (0.0/1.0 accelerator_type:P100)
Result logdir: ~/ray_results/trainable_2023-01-27_22-15-26
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+--------------------+
| Trial name | status | loc |
|-----------------------+----------+--------------------|
| trainable_0e5d3_00000 | RUNNING | 10.206.42.11:52335 |
+-----------------------+----------+--------------------+
(trainable pid=52335) 56
Trial trainable_0e5d3_00000 completed. Last result:
== Status ==
Current time: 2023-01-27 22:15:53 (running for 00:00:22.32)
Memory usage on this node: 22.9/503.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/56 CPUs, 0/6 GPUs, 0.0/208.61 GiB heap, 0.0/93.39 GiB objects (0.0/1.0 accelerator_type:P100)
Result logdir: ~/ray_results/trainable_2023-01-27_22-15-26
Number of trials: 1/1 (1 TERMINATED)
+-----------------------+------------+--------------------+
| Trial name | status | loc |
|-----------------------+------------+--------------------|
| trainable_0e5d3_00000 | TERMINATED | 10.206.42.11:52335 |
+-----------------------+------------+--------------------+
(trainable pid=52335) ------ Time ------: 18.864986181259155
2023-01-27 22:15:54,064 INFO tune.py:762 -- Total run time: 23.17 seconds (22.31 seconds for the tuning loop).
So it takes approx. 5 more times with ray. I tried on an Ubuntu EC2 machine with 8 CPUs, thus using os.environ["OMP_NUM_THREADS"] = '8'
and trainable = tune.with_resources(trainable, {"cpu": 8})
and I got approximately the same time with ray and without ray… (~ 2.5 seconds).
I don’t know if you have any clue on where the issue could come from with the first bigger machine. I tried setting gpu
to 0 in the tune.with_resources
but this does not change anything.
Thanks again.