Ray MLflow Callback for nested trials

saivivek15 · March 14, 2023, 10:32pm

Hi Team,

I have been experiencing the below issue when I run the Population-Based Training scheduler for Ray Tune jobs. (Ray 2.0.0)

mlflow.exceptions.MlflowException: Changing param values is not allowed. Param with key='weight_decay' was already logged with value='0.2694032858904558' for run ID='5f50082291ec4f7cb3ab5692833839d5'. Attempted logging new value '0.07364522856301339'.

The cause of this error is typically due to repeated calls
to an individual run_id event logging.

Incorrect Example:
---------------------------------------
with mlflow.start_run():
    mlflow.log_param("depth", 3)
    mlflow.log_param("depth", 5)
---------------------------------------

Which will throw an MlflowException for overwriting a
logged parameter.

Correct Example:
---------------------------------------
with mlflow.start_run():
    with mlflow.start_run(nested=True):
        mlflow.log_param("depth", 3)
    with mlflow.start_run(nested=True):
        mlflow.log_param("depth", 5)
---------------------------------------

Which will create a new nested run for each individual
model and prevent parameter key collisions within the
tracking store.'

I’m currently using the MLflowLoggerCallback from ray.air.callbacks.mlflow to report metrics and parameters to the remote mlflow server. It does not have the nested parameter to pass it to the mlflow util inside this class that invokes mlflow.start_run(). I would really appreciate any help on how to pass nested=True for MlflowLoggerCallback. Please let me know if there are any alternative implementations for the same.

Thank you for your time.

Regards,
Vivek

Jules_Damji · March 15, 2023, 9:31pm

@saivivek15 Looks like this might be related to old issue still open. I remember this from my days at Databricks and MLflow.

saivivek15 · March 15, 2023, 11:31pm

Thank you @Jules_Damji for sharing the old issue. Could you please share if you have any workarounds/suggestions that you might have tried?

Jules_Damji · March 16, 2023, 4:51pm

One thing you could try as workaround is salt the depth parameter. That is,

depth_id="depth" + <uniquid_id>
with mlflow.start_run():
            mlflow.log_param("depth_id", <value>)

That way each trail will log its unique depth_id-parameter.

Topic		Replies	Views
Ray Tune train.report logs parameters via MLflowLoggerCallback in mlflow as metrics Ray Tune	0	217	January 27, 2024
Problems combining ray tune, mlflow and keras (tensorflow) Ray Tune	2	674	April 24, 2023
Mlflow with ray core	2	441	February 3, 2023
Add step info dictionary to MLflowLoggerCallback with Tune RLlib	7	883	July 19, 2022
MLflow Tracking - How to set Username, Source, Version and Model? Ray Tune	3	491	October 19, 2021

Ray MLflow Callback for nested trials

Related topics