Hi, I have a question regarding how is the resource being assigned to each trial/remote task when remote tasks are scheduled by trials of a trainable.
Here’s an example:
import ray
from ray import tune
import time
hparams_space = {
"x": tune.uniform(0, 1),
"y": tune.grid_search([0, 1, 2, 3, 4, 5, 6, 7]),
}
@ray.remote(num_returns=2)
def get_score(config):
time.sleep(20)
return config["x"] ** 2, config["y"] * -1
def run_trial(config):
score, score2 = ray.get(get_score.remote(config))
ret = {'score':score, 'score2':score2}
tune.report(**ret)
if __name__ == "__main__":
analysis = tune.run(
run_trial,
num_samples=1,
config=hparams_space,
resources_per_trial=tune.PlacementGroupFactory([{}, {"CPU": 2}]),
)
So from what I’ve been observing, the number of resources required for each trial is defined by the second dictionary in the bundle list, and the number of remote tasks run in parallel is the number of CPUs defined, in this case 2, because by default one remote task is assigned to one worker and that’s by default a CPU unit. The part I was confused about is in the docs of PlacementGroupFactory, it says
This could be used e.g. if you had one learner running in the main trainable that schedules two remote workers that need access to 2 CPUs each.
on example:
from ray import tune
tuner = tune.Tuner(
tune.with_resources(
train,
resources=tune.PlacementGroupFactory([
{"CPU": 1, "GPU": 0.5, "custom_resource": 2},
{"CPU": 2},
{"CPU": 2},
], strategy="PACK")
)
)
tuner.fit()
I’m not sure I understand the meaning of “schedule two remote workers here”. Is that equivalent to “two remote tasks” here? Please correct me if I’m wrong at any point.
Thanks in advance.