Ray Tune Executing Binary Executable

Jay_Ma · April 22, 2022, 5:25pm

High: It blocks me to complete my task.

Greetings,

I have a python wrapper built on our custom cpp simulator which would be invoked by python via subprocess (running with cli). The cpp binary executable is already built.
My question is, how can I use this in ray tune? I tried run with ray tune but it said that it cant find my binary (I can surely run the binary locally)

Thanks

xwjiang2010 · April 22, 2022, 6:40pm

Could you share your script and set up? Are you running on one node?

Jay_Ma · April 24, 2022, 12:45am

@xwjiang2010 yep. We are only using one node (but with multiple GPUs)
And the script will be a simple:

        # config (dict): A dict of hyperparameters.
        process = subprocess.Popen(args, stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE, env=my_env)
        print(t.tf_path)
        while not check_event_exists(t.tf_path):
            time.sleep(1)
        print("Found tf log...")
        summary = _summary_iterator(t.tf_path)
        while True:
            if process.poll() is not None:
                break  # got signal from process, break here
            for e in summary: # read from tensorboard log
                for v in e.summary.value:
                    logger.debug(v)
                    if v.tag == 'ret_mean':
                        ret_mean = v.simple_value
                        logger.info(ret_mean)
                        # This sends the score to Tune.
                        tune.report(ret_mean=ret_mean)
            time.sleep(1)

subprocess.Popen will be the main entry point where we envoke our cpp binary

xwjiang2010 · April 24, 2022, 1:48pm

Got you.
How did you provide the executable path?
Could this have something to do with Tune changing working directory inside of each trial?
You may want to take a look at the doc string here: ray.tune.trainable — Ray 1.12.0

Jay_Ma · April 25, 2022, 4:13am

Thanks @xwjiang2010
Abs path solved the problem.
Another qq for resource management.
If I allocate 0.5 GPU for each trial (and we have 8 GPUs), how could we make sure that we aware which GPU to use inside that python trainable function?
E.g.
based on resource management config, how can we retrieve context of which gpu (cuda:0 or cuda:7) inside trainable?

xwjiang2010 · April 25, 2022, 2:39pm

Ah i see.
Could you call the API of the framework you use inside of trainable?
For example, on pytorch: torch.cuda.current_device()

Jay_Ma · April 25, 2022, 5:23pm

yes.
How could i get the GPU allocated by ray dynamically in this way using torch.cuda.current_device()? @xwjiang2010

xwjiang2010 · April 25, 2022, 9:05pm

How about just adding this line to the trainable function that you have?

Jay_Ma · April 26, 2022, 7:08am

Hello, I can run the executable now in ray, however it got stuck somehow and now utilizing the GPU properly.

xwjiang2010 · April 26, 2022, 3:24pm

Which part of the code is supposed to run on GPU? If you take ray tune out of the picture and just run one trial, does it run on GPU properly?

Topic		Replies	Views
Using ray tune to optimise a function called with subprocess Ray Tune	0	160	March 15, 2024
Parallelly running experiments with Ray Tune on a single Machine Ray Tune	8	111	March 6, 2025
Integrate my own executer with Tune Ray Tune	1	522	October 18, 2021
Unable to run example, returns error message	4	973	March 14, 2023
[RLlib] Subprocess in my customize env when tunning Ray Tune	0	384	December 3, 2022

Ray Tune Executing Binary Executable

Related topics