Collecting metrics for different variation of the same experiment

cool-RR · January 2, 2023, 6:33pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a multi-agent reinforcement learning experiment that I’m running with RLlib. I would like to do a certain measurement, and I’m not sure which tool to use and how.

I wrote a multi-agent environment implementing a certain game for two agents. I train two PPO policies on that environment for 200 training iterations, while collecting metrics about the agents’ performances in each iteration. As the training iterations advance, the mean reward rises until it reaches a plateau. I measure other interesting metrics besides the reward, one of them being “Metric X”.

Now, when writing this environment, one of the decisions I make is what kind of data I expose to the agent in their observations, and how that data is expressed. I would like to compare the results I get for Metric X for different ways that I might represent the observation.

I thought up of 4 different ways for setting up the observation. Let’s call these 4 “arrangements”. Right now I switch between the 4 arrangements by commenting out different sections of the code that calculates the observation. This is clunky, of course.

I would like to do many runs of 200 training iterations on this environment, under each of the 4 arrangements. For each arrangement, I’d like to see what values it produces for Metric X on the 200th training iteration. I’m not just looking for which arrangement gives me the highest Metric X, and not even the mean Metric X for the arrangement. I want to get a list of all the Metric X results for each of these arrangements, so I may find the mean, standard deviation and plot histograms of them.

What tool or framework or method should I use to collect this information? Is Ray Tune fit for this job? I need some way to programmatically tell Ray to sometimes include some values in the observation and sometimes not to. I need to run Ray many times and have these metrics automatically collected and aggregated by arrangement. I could write logic that does it, but I want to know how this is usually dne.

Thanks for your help,
Ram Rachum.

kourosh · January 5, 2023, 4:57am

Hey @cool-RR ,

One thing you can do is to provide a control knob for these arrangements and sweep it via Tune. Tune will launch 4 experiments (one for each setting) and run the PPO policy for each trial for 200 training iterations. Then you can use tensorboard or your favorite logging tool or even postprocess the produced csv files to get the statistics of the metric that you want across all 4 trials.

cool-RR · January 5, 2023, 3:59pm

Thanks @kourosh . When you say “control knob”, you mean grid_search?

kourosh · January 5, 2023, 4:26pm

@cool-RR Yes. The code would look something like this:

from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig

config = (
    PPOConfig()
    .env(
        NAME_OF_THE_ENV, 
        env_config={"arrangement": tune.grid_search(["A", "B", "C", "D"])})
)

tuner = tune.Tuner(
    "PPO",
    param_space=config
)
tuner.fit()

cool-RR · January 5, 2023, 4:39pm

Interesting, I see that you use Tune in a way that’s different than the other people who showed me. I prefer your way because it looks more similar to how I currently work.

However, I don’t want Tune to fit right now, I just want to see a bunch of results. If I change fit to run in your code, will that work?

Also, another question: Within each run of the experiment, I’d like to write arbitrary files to the folder of that specific run of the experiment. Is that possible? How?

Thanks for your help,
Ram Rachum.

kourosh · January 6, 2023, 10:41pm

However, I don’t want Tune to fit right now, I just want to see a bunch of results. If I change fit to run in your code, will that work?

when you say you want to see bunch of results, what do you exactly mean? Are those results not coming from a trained policy on those envs with different configurations?

Also, another question: Within each run of the experiment, I’d like to write arbitrary files to the folder of that specific run of the experiment. Is that possible? How?

Yes, you can use tune callbacks for this. See A Guide To Callbacks & Metrics in Tune — Ray 2.8.0. If you still have question about this, I recommend posting a separate question with more details so that the community can help you better.

cool-RR · January 7, 2023, 4:22pm

Regarding writing arbitrary files to the experiment’s folder, I just found ray.tune.get_trial_dir() that does exactly what I want

Regarding the other part of my question: I tried using tune.run, and after some trial and error (especially around resource allocation…) I got it working, and now I see a full CSV table with all the metrics for each experiment.

Thank you very much Kourosh!

kourosh · January 7, 2023, 7:52pm

That’s awesome to hear. Just to be clear, tuner.fit() is the air (AI runtime) version of tune.run().

Topic		Replies	Views
Learning curves Ray Tune	5	772	March 15, 2022
Accessing rllib evaluation in tune.Analysis Ray Tune	5	1026	June 17, 2021
Measuring custom metrics RLlib	1	1382	October 1, 2022
Custom metrics only mean value RLlib	3	869	February 16, 2022
Logging custom metrics by trial during PBT training RLlib	1	232	July 1, 2021

Collecting metrics for different variation of the same experiment

Related topics