I have a python wrapper built on our custom cpp simulator which would be invoked by python via subprocess (running with cli). The cpp binary executable is already built.
My question is, how can I use this in ray tune? I tried run with ray tune but it said that it cant find my binary (I can surely run the binary locally)
@xwjiang2010 yep. We are only using one node (but with multiple GPUs)
And the script will be a simple:
# config (dict): A dict of hyperparameters.
process = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, env=my_env)
print(t.tf_path)
while not check_event_exists(t.tf_path):
time.sleep(1)
print("Found tf log...")
summary = _summary_iterator(t.tf_path)
while True:
if process.poll() is not None:
break # got signal from process, break here
for e in summary: # read from tensorboard log
for v in e.summary.value:
logger.debug(v)
if v.tag == 'ret_mean':
ret_mean = v.simple_value
logger.info(ret_mean)
# This sends the score to Tune.
tune.report(ret_mean=ret_mean)
time.sleep(1)
subprocess.Popen will be the main entry point where we envoke our cpp binary
Got you.
How did you provide the executable path?
Could this have something to do with Tune changing working directory inside of each trial?
You may want to take a look at the doc string here: ray.tune.trainable — Ray 1.12.0
Thanks @xwjiang2010
Abs path solved the problem.
Another qq for resource management.
If I allocate 0.5 GPU for each trial (and we have 8 GPUs), how could we make sure that we aware which GPU to use inside that python trainable function?
E.g.
based on resource management config, how can we retrieve context of which gpu (cuda:0 or cuda:7) inside trainable?