Add step info dictionary to MLflowLoggerCallback with Tune

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am using RLlib with ML Flow, by using the MLflowLoggerCallback and passing it to

tune.run(
    **config,  
    callback=MLflowLoggerCallback(experiment_name="test")
)

However, this makes the callback log only a few metrics that are generated by RLlib.

In my environment, I return several metrics of interest in the info returned at each environment step. How can I have them logged by MLFlow?

Hi @fedetask,

You need to add a callback that adds your custom metrics.

Here is an explanation:

https://docs.ray.io/en/latest/rllib/rllib-training.html#callbacks-and-custom-metrics

Here is an example:


So I should have both a CustomCallback and the MLflowLoggerCallback, right? And the custom callback will add the info to the custom_metrics, so they should be passed to tune.run() in this order:

tune.run(callbacks=[CustomCallback(), MLflowLoggerCallback()]) 

am I correct?

@fedetask,

I do not think that is quite how it should work, but I could be wrong. In my understanding, the MLflowLoggerCallback is a tune callback and the CustomCallback is an rllib callback. I think it should look something like this.

tune.run(config={...,  "callbacks": CustomCallback,},
         callbacks=[MLflowLoggerCallback()])
2 Likes

@xwjiang2010 can you please chime in here

@mannyv’s understanding of the callbacks is correct. @fedetask did you get a chance to try it?

Hello,
Unfortunately, I am using Ray version 1.8.0; I cannot upgrade it for now.

I did as @mannyv described but things still don’t work. I think the reason is that in Ray 1.8.0, the MLflowLoggerCallback in mlflow.py, lines 148-160, logs stuff as follows:

148    def log_trial_result(self, iteration: int, trial: "Trial", result: Dict):
149        step = result.get(TIMESTEPS_TOTAL) or result[TRAINING_ITERATION]
150        run_id = self._trial_runs[trial]
151        for key, value in result.items():
152            try:
153                value = float(value)
154            except (ValueError, TypeError):
155                logger.debug("Cannot log key {} with value {} since the "
156                             "value cannot be converted to float.".format(
157                                 key, value))
158                continue
159            self.client.log_metric(
160                run_id=run_id, key=key, value=value, step=step)

and for key='custom_metrics', value is a dictionary that cannot be cast to float in line 153 and therefore isn’t logged.

I solved it by creating a new class that extends MLflowLoggerCallback and overriding the log_trial_result to allow for the custom_metrics dictionary to be logged.

2 Likes

(post deleted by author)