Comparative Analysis of Sentiment Classification Techniques on Flipkart Product Reviews: A Study Using Logistic Regression, SVC, Random Forest, and Gradient Boosting
Main Article Content
Sentiment analysis plays a crucial role in e-commerce, providing valuable insights from customer reviews on platforms like Flipkart. This study aims to compare the effectiveness of various sentiment classification techniques, specifically Logistic Regression, Support Vector Classifier (SVC), Random Forest, and Gradient Boosting. The dataset, collected from Flipkart, consists of 205,052 product reviews spanning various categories. Key data preprocessing steps included handling missing values, removing duplicates, normalizing text, and applying TF-IDF vectorization for feature extraction. We implemented and tuned the hyperparameters for each algorithm using grid search and randomized search. The data was divided into training and testing sets with an 80-20 split, and cross-validation techniques ensured robust model evaluation. The performance of each model was assessed using several metrics: accuracy, precision, recall, F1-score, and ROC-AUC. The results revealed that Logistic Regression achieved an accuracy of 0.8995, precision of 0.8773, recall of 0.8995, an F1 score of 0.8736, and a ROC AUC score of 0.9105. The SVC model showed slightly higher accuracy at 0.8997, precision of 0.8619, recall of 0.8997, and an F1 score of 0.8738. The Random Forest model, while robust, had lower accuracy (0.7953) and struggled with precision (0.6326), recall (0.7953), and an F1 score of 0.7047, but achieved a ROC AUC score of 0.9037. Gradient Boosting performed comparably to Logistic Regression with an accuracy of 0.8993, precision of 0.8512, recall of 0.8993, an F1-score of 0.8735, and a ROC AUC score of 0.9098. Comparative analysis identified SVC and Logistic Regression as top performers, balancing accuracy and computational efficiency. These findings suggest that implementing these models can significantly enhance sentiment analysis in e-commerce, improving customer insights and business strategies. Future research should explore advanced deep learning techniques and address class imbalances to further refine sentiment analysis capabilities.