Ray MLflow Callback for nested trials

Hi Team,

I have been experiencing the below issue when I run the Population-Based Training scheduler for Ray Tune jobs. (Ray 2.0.0)

mlflow.exceptions.MlflowException: Changing param values is not allowed. Param with key='weight_decay' was already logged with value='0.2694032858904558' for run ID='5f50082291ec4f7cb3ab5692833839d5'. Attempted logging new value '0.07364522856301339'.

The cause of this error is typically due to repeated calls
to an individual run_id event logging.

Incorrect Example:
---------------------------------------
with mlflow.start_run():
    mlflow.log_param("depth", 3)
    mlflow.log_param("depth", 5)
---------------------------------------

Which will throw an MlflowException for overwriting a
logged parameter.

Correct Example:
---------------------------------------
with mlflow.start_run():
    with mlflow.start_run(nested=True):
        mlflow.log_param("depth", 3)
    with mlflow.start_run(nested=True):
        mlflow.log_param("depth", 5)
---------------------------------------

Which will create a new nested run for each individual
model and prevent parameter key collisions within the
tracking store.'

I’m currently using the MLflowLoggerCallback from ray.air.callbacks.mlflow to report metrics and parameters to the remote mlflow server. It does not have the nested parameter to pass it to the mlflow util inside this class that invokes mlflow.start_run(). I would really appreciate any help on how to pass nested=True for MlflowLoggerCallback. Please let me know if there are any alternative implementations for the same.

Thank you for your time.

Regards,
Vivek

@saivivek15 Looks like this might be related to old issue still open. I remember this from my days at Databricks and MLflow.

Thank you @Jules_Damji for sharing the old issue. Could you please share if you have any workarounds/suggestions that you might have tried?

One thing you could try as workaround is salt the depth parameter. That is,

depth_id="depth" + <uniquid_id>
with mlflow.start_run():
            mlflow.log_param("depth_id", <value>)

That way each trail will log its unique depth_id-parameter.