What is the best practice to compare multiple algorithms?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello all,

I would like to compare the performance of multiple algorithms, let’s say ppo and dqn, for my custom environment.

I know I can manually execute my training pipeline multiple times for each algorithm in order to achieve a statistically significant comparison, and then compare their respective performances. However, I believe there might be a more efficient approach.

I wonder if anyone knows what is the best way to perform this comparison in RLlib?
I would like to run my training pipeline only once in which each algorithm is trained for example n times on the same env, and then compare the performance of the algorithms.


Hi, You can create a script that iterates over the algorithms Or if you want to do them in parallel you can use ray core to launch those experiments in parallel (assuming you have enough resources to run them concurrently)

Hi @kourosh,
Thanks for your quick reply.

create a script that iterates over the algorithms: yes, this is what I’m doing now, but I thought there might be a better way to do that; like hyperparameter tuning in Ray.

I wonder if you know if I can use wandb to perform this? At least, wandb makes multiple runs, and then provides me with a dashboard where doing comparisons is easy. However, I’m not sure whether the WandbLoggerCallback supports this or not.


You can use Ray Tune out of the box to sweep hyper-parameters within a single algorithm. If you want to sweep the algorithm itself you need to create a function / trainable that runs the algorithm and optionally use tune to sweep the parameters of that.

Regarding the wandb you can also create the WandbLoggerCallback with different groups that will separate let’s say DQN from PPO but still allows you to compare them against each other.

1 Like

Many thanks @kourosh,
I wonder if you have any script showing how to modify WandbLoggerCallback for such a scenario?

I’ve figured it out! It was easier than I expected!

This is how I implemented it:

You can have two for loops: one for algos (n_runs=2 in my implementation) and one for the number of times you want to run each algo to have a good statistical comparison (n_trials=5 in my implementation). Like this:

for run_id in range(n_runs):           
    for trial_id in range(n_trials):

Then, in the RunConfig you can set the callbacks with WandbLoggerCallback, while passing the project and group arguments. Like this:

run_config = air.RunConfig(

Next, on your wandb dashboard you will see the following figures, and of course, you can compare them by grouping button of the wandb’s figure edit panel.

I hope this is useful for others as well!

1 Like