Ray Tune Executing Binary Executable

  • High: It blocks me to complete my task.

Greetings,

I have a python wrapper built on our custom cpp simulator which would be invoked by python via subprocess (running with cli). The cpp binary executable is already built.
My question is, how can I use this in ray tune? I tried run with ray tune but it said that it cant find my binary (I can surely run the binary locally)

Thanks

Could you share your script and set up? Are you running on one node?

@xwjiang2010 yep. We are only using one node (but with multiple GPUs)
And the script will be a simple:

        # config (dict): A dict of hyperparameters.
        process = subprocess.Popen(args, stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE, env=my_env)
        print(t.tf_path)
        while not check_event_exists(t.tf_path):
            time.sleep(1)
        print("Found tf log...")
        summary = _summary_iterator(t.tf_path)
        while True:
            if process.poll() is not None:
                break  # got signal from process, break here
            for e in summary: # read from tensorboard log
                for v in e.summary.value:
                    logger.debug(v)
                    if v.tag == 'ret_mean':
                        ret_mean = v.simple_value
                        logger.info(ret_mean)
                        # This sends the score to Tune.
                        tune.report(ret_mean=ret_mean)
            time.sleep(1)

subprocess.Popen will be the main entry point where we envoke our cpp binary

Got you.
How did you provide the executable path?
Could this have something to do with Tune changing working directory inside of each trial?
You may want to take a look at the doc string here: ray.tune.trainable — Ray 1.12.0

Thanks @xwjiang2010
Abs path solved the problem.
Another qq for resource management.
If I allocate 0.5 GPU for each trial (and we have 8 GPUs), how could we make sure that we aware which GPU to use inside that python trainable function?
E.g.
based on resource management config, how can we retrieve context of which gpu (cuda:0 or cuda:7) inside trainable?

Ah i see.
Could you call the API of the framework you use inside of trainable?
For example, on pytorch: torch.cuda.current_device()

yes.
How could i get the GPU allocated by ray dynamically in this way using torch.cuda.current_device()? @xwjiang2010

How about just adding this line to the trainable function that you have?

Hello, I can run the executable now in ray, however it got stuck somehow and now utilizing the GPU properly.

Which part of the code is supposed to run on GPU? If you take ray tune out of the picture and just run one trial, does it run on GPU properly?