Best practice for multiple evaluation metrics with intermediate model

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a module that should be evaluated a second time after a slightly modification, e.g. I have

algo = config.build()
for episode in range(episodes):
    train_results = algo.train()  # <- performs normal evaluation
    # change how model operates
    algo.get_module().switch_mode(True)
    special_eval_results = algo.evaluate()  # <- perform a different evaluation
    algo.get_module().switch_mode(False)

Of course this has the problem that the algo.metrics are shared, so I need a second metric, e.g. also swap out algo.metrics in between.
I wrote myself a custom evaluation function that does exactly that, but it does not look that nice and it feels like a strong restriction on the features that ray provides.

Now I have the idea of using two algorithms, e.g. with a shared module or sync the weights beforehand but I am not sure how well this will perform and maybe interfere with logging of the results (as I also do not want to evaluate every epoch).


So far my ideas. What ideas do you have to do this in a good way?