What is the best practice to compare multiple algorithms?

deepgravity · May 31, 2023, 4:20pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello all,

I would like to compare the performance of multiple algorithms, let’s say ppo and dqn, for my custom environment.

I know I can manually execute my training pipeline multiple times for each algorithm in order to achieve a statistically significant comparison, and then compare their respective performances. However, I believe there might be a more efficient approach.

I wonder if anyone knows what is the best way to perform this comparison in RLlib?
I would like to run my training pipeline only once in which each algorithm is trained for example n times on the same env, and then compare the performance of the algorithms.

Thanks!

kourosh · May 31, 2023, 4:36pm

Hi, You can create a script that iterates over the algorithms Or if you want to do them in parallel you can use ray core to launch those experiments in parallel (assuming you have enough resources to run them concurrently)

deepgravity · May 31, 2023, 4:44pm

Hi @kourosh,
Thanks for your quick reply.

create a script that iterates over the algorithms: yes, this is what I’m doing now, but I thought there might be a better way to do that; like hyperparameter tuning in Ray.

I wonder if you know if I can use wandb to perform this? At least, wandb makes multiple runs, and then provides me with a dashboard where doing comparisons is easy. However, I’m not sure whether the WandbLoggerCallback supports this or not.

Thanks!

kourosh · May 31, 2023, 5:00pm

You can use Ray Tune out of the box to sweep hyper-parameters within a single algorithm. If you want to sweep the algorithm itself you need to create a function / trainable that runs the algorithm and optionally use tune to sweep the parameters of that.

Regarding the wandb you can also create the WandbLoggerCallback with different groups that will separate let’s say DQN from PPO but still allows you to compare them against each other.

deepgravity · May 31, 2023, 5:01pm

Many thanks @kourosh,
I wonder if you have any script showing how to modify WandbLoggerCallback for such a scenario?

deepgravity · May 31, 2023, 6:17pm

I’ve figured it out! It was easier than I expected!

This is how I implemented it:

You can have two for loops: one for algos (n_runs=2 in my implementation) and one for the number of times you want to run each algo to have a good statistical comparison (n_trials=5 in my implementation). Like this:

for run_id in range(n_runs):           
    for trial_id in range(n_trials):

Then, in the RunConfig you can set the callbacks with WandbLoggerCallback, while passing the project and group arguments. Like this:

run_config = air.RunConfig(
              callbacks=[
                  WandbLoggerCallback(project=project_name,
                                      group=f"algo_{algo_name}__run_{run_id}__trial_{trial_id}"
                                      )
              )

Next, on your wandb dashboard you will see the following figures, and of course, you can compare them by grouping button of the wandb’s figure edit panel.

I hope this is useful for others as well!

Topic		Replies	Views
Hyperparameter sweep with Ray	2	624	December 9, 2021
Tune + RLLIB + Wandb integration Configure Algorithm, Training, Evaluation, Scaling	0	113	June 17, 2024
Performance of algorithms RLlib	3	618	September 2, 2021
Tune/WandB - Group different samples under job type	1	299	March 7, 2023
A little help for a novice RLlib	1	426	October 26, 2022

What is the best practice to compare multiple algorithms?

Related topics