Using CustomKBinsDiscretizer got wrong binning results

Hi,
I try to use the CustomKBinsDiscretizer provided by ray to perform feature binning on my data, but I get wrong results(eg:0.25488 should belong to 7 bin,but get 6 bin).

Below is my sample code:

        import pandas as pd
        import ray
        from ray.data.preprocessors import 
        df = pd.DataFrame(
            pd.Series([0.25488, -1.14293, -1.45107, -0.87993, 0.42676, 0.96310, -0.66250, -0.45334, -0.60658, -0.58381,
                       -0.10751, -0.48234, 0.74152, -0.95448, -0.35601, -0.91099, 0.86991, 1.88669, 1.77901]),
            columns=['a'])
        ds = ray.data.from_pandas(df)
        discretizer = CustomKBinsDiscretizer(
            columns=["a"],
            bins={'a': [-1.04767, -0.784675, -0.612797, -0.46991, -0.26904, -0.053674, 0.2321, 0.840923, 1.53672,
                        4.094189]},
        )
        discretizer_data = discretizer.transform(ds).to_pandas()
        print(discretizer_data)

@Yard1 do you have any insights for this preprocessor ?

@839576266 let me take a look!

1 Like

The bins are zero indexed, therefore:

(-inf, -1.04767]:      NaN
(-1.04767, -0.784675]: 0.0
(-0.784675, -0.61279]: 1.0
(-0.612797, -0.46991]: 2.0
(-0.46991, -0.26904]:  3.0
(-0.26904, -0.053674]: 4.0
(-0.053674, 0.2321]:   5.0
(0.2321, 0.840923]:    6.0
(0.840923, 1.53672]:   7.0
(1.53672, 4.094189]:   8.0
(4.094189, inf]:       NaN

I have manually compared the bins returned by the example you have posted and it looks to me like it is working as intended. If you would like the bins to be 1-indexed, you can just use a BatchMapper preprocessor to simply add 1 to the results.

Thanks for your reply,I try to solve my question with your method.