Comparison of K-Means and DBSCAN Algorithms for Customer Segmentation in E-commerce
Main Article Content
Customer segmentation is crucial for e-commerce businesses to effectively target and engage specific customer groups. This study compares the effectiveness of two popular clustering algorithms, K-Means and DBSCAN, in segmenting e-commerce customers. The primary objective is to evaluate and contrast these algorithms to determine which provides more meaningful and actionable customer segments. The methodology involves analyzing a comprehensive e-commerce customer dataset, which includes various features such as customer ID, gender, age, city, membership type, total spend, items purchased, average rating, discount applied, days since last purchase, and satisfaction level. Initial data preprocessing steps include handling missing values, encoding categorical variables, and normalizing numerical features. Both K-Means and DBSCAN algorithms are implemented, and their performance is evaluated using metrics such as silhouette score, Davies-Bouldin index, and Calinski-Harabasz score. The results indicate that K-Means achieved a silhouette score of 0.546, a Davies-Bouldin index of 0.655, and a Calinski-Harabasz score of 552.9. In contrast, DBSCAN achieved a higher silhouette score of 0.680, a Davies-Bouldin index of 1.344, and a Calinski-Harabasz score of 1123.9. These findings suggest that while DBSCAN performs better in terms of silhouette score, indicating more distinctly separated clusters, its higher Davies-Bouldin index reflects fewer compact clusters. The discussion highlights that K-Means is suitable for applications requiring clear and well-defined segments of customers, as it produces balanced cluster sizes. DBSCAN, with its strength in identifying clusters of varying densities and handling noise, is more effective in detecting niche markets and unique customer behaviors. This study's findings have significant practical implications for e-commerce businesses looking to enhance their customer segmentation strategies. In conclusion, both K-Means and DBSCAN demonstrate their respective strengths and weaknesses in clustering the e-commerce customer dataset. The choice of algorithm should be based on the specific requirements of the segmentation task. Future research could explore hybrid methods combining the strengths of both algorithms and incorporate additional data sources for a more comprehensive analysis.