Performance problem in parallelization of knn

LukeTheWalker · November 6, 2024, 6:21pm

I am trying out various paradigm to parallelize a knn algorithm, among them I was trying to use ray, below my simple code:

import numpy as np
import ray
from tqdm import tqdm

@ray.remote
def _predict(X_train, y_train, x, k):
    # Calculate distances from the input point to all training points
    distances = np.linalg.norm(X_train - x, axis=1)
    # Sort by distance and return indices of the first k neighbors
    k_indices = np.argsort(distances)[:k]
    # Extract the labels of the k nearest neighbor training samples
    k_nearest_labels = [y_train[i] for i in k_indices]
    # Return the most common class label among the k nearest neighbors
    most_common = np.bincount(k_nearest_labels).argmax()
    return most_common

def predict (X_train, y_train, X_test, k):
    y_pred = ray.get([_predict.remote(X_train, y_train, x, k) for x in X_test])
    
class KNNClassifier:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = ray.put(X)
        self.y_train = ray.put(y)

    def predict(self, X):
        # y_pred = [self._predict(x) for x in X]
        return predict(self.X_train, self.y_train, X, self.k)

This code gives way inferior performances when compared to a very similar MPI solution, can anybody guess why?

Ruiyang_Wang · November 8, 2024, 1:18am

can you share the config of your nodes (e.g. num of cpus, …) and size of the dataset?

Also, in the first run we expect there are overheads in creating Ray workers. If you run predict multiple times, latency will reduce.

LukeTheWalker · November 9, 2024, 12:58pm

I have a single node using 128 cpus, the dataset size is:
100000 datapoint, each with 500 features, k = 5 and 1000 test point

The slowness is well distributed among all the runs, the overhead on the creation of the workers is negligible, in total it takes ~60 seconds, in contrast to the MPI version which takes 3 seconds

Topic		Replies	Views
Ray + Fast API Performance Issues	0	404	April 9, 2022
Using ray results in a single CPU execution	2	99	April 24, 2024
Improve and verify the performance of code on Ray Ray Core	0	303	March 3, 2021
Processing performance of tasks Ray Core	14	804	March 8, 2021
Parallelization of Graph algorithm on Ray Cluster + SLURM Ray Core	7	717	December 7, 2022

Performance problem in parallelization of knn

Related topics