Best practice for multiple evaluation metrics with intermediate model

Daraan · January 9, 2025, 3:41pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a module that should be evaluated a second time after a slightly modification, e.g. I have

algo = config.build()
for episode in range(episodes):
    train_results = algo.train()  # <- performs normal evaluation
    # change how model operates
    algo.get_module().switch_mode(True)
    special_eval_results = algo.evaluate()  # <- perform a different evaluation
    algo.get_module().switch_mode(False)

Of course this has the problem that the algo.metrics are shared, so I need a second metric, e.g. also swap out algo.metrics in between.
I wrote myself a custom evaluation function that does exactly that, but it does not look that nice and it feels like a strong restriction on the features that ray provides.

Now I have the idea of using two algorithms, e.g. with a shared module or sync the weights beforehand but I am not sure how well this will perform and maybe interfere with logging of the results (as I also do not want to evaluate every epoch).

So far my ideas. What ideas do you have to do this in a good way?

Topic		Replies	Views
Custom metric that combines train and eval results RLlib	2	448	June 10, 2022
Question about evaluation RLlib	0	139	October 31, 2023
Algo.evaluate() vs auto eval RLlib	0	160	November 8, 2023
Using built-in evaluation in rllib RLlib	1	265	January 28, 2022
Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration RLlib	1	361	August 26, 2022

Best practice for multiple evaluation metrics with intermediate model

Related topics