Comparative Analysis of Distance Metrics in KNN and SMOTE Algorithms for Software Defect Prediction
Abstract
Keywords
Full Text:
Link DownloadReferences
Alfeilat, H. A. A., Hassanat, A. B. A., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., Salman, H. S. E., & Prasath, V. B. S. (2024). Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review | Big Data. Retrieved July 26, 2024, from https://www.liebertpub.com/doi/abs/10.1089/big.2018.0175.
Bala, Y. Z., Samat, P. A., Sharif, K. Y., & Manshor, N. (2024). The influence of machine learning on the predictive performance of cross-project defect prediction: empirical analysis. TELKOMNIKA (Telecommunication Computing Electronics and Control), 22(4), 830–837.
Bowers, A. J., & Zhou, X. (2019). Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes. Journal of Education for Students Placed at Risk (JESPAR), 24(1), 20–46. Routledge.
Chakraborty, J., Majumder, S., & Menzies, T. (2021). Bias in machine learning software: Why? how? what to do? Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 (pp. 429–440). New York, NY, USA: Association for Computing Machinery. Retrieved August 24, 2024, from https://dl.acm.org/doi/10.1145/3468264.3468537
D’Ambros, M., Lanza, M., & Robbes, R. (2010, May 2). (PDF) An extensive comparison of bug prediction approaches. Retrieved July 26, 2024, from https://www.researchgate.net/publication/221657038_An_extensive_comparison_of_bug_prediction_approaches
Dudjak, M., & Martinović, G. (2020). In-Depth Performance Analysis of SMOTE-Based Oversampling Algorithms in Binary Classification. International journal of electrical and computer engineering systems, 11(1), 13–23. Sveučilišta Josipa Jurja Strossmayera u Osijeku, Elektrotehnički fakultet.
Duy-An Ha, Chen, T.-H., & Yuan, S.-M. (2019). Unsupervised methods for Software Defect Prediction. Proceedings of the 10th International Symposium on Information and Communication Technology, SoICT ’19 (pp. 49–55). New York, NY, USA: Association for Computing Machinery. Retrieved August 24, 2024, from https://doi.org/10.1145/3368926.3369711
Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537.
Hidayati, N., & Hermawan, A. (2021). K-Nearest Neighbor (K-NN) algorithm with Euclidean and Manhattan in classification of student graduation. Journal of Engineering and Applied Technology, 2(2). Retrieved August 24, 2024, from https://journal.uny.ac.id/index.php/jeatech/article/view/42777
Huang, C., Li, Y., Loy, C. C., & Tang, X. (2020). Deep Imbalanced Learning for Face Recognition and Attribute Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2781–2794. Presented at the IEEE Transactions on Pattern Analysis and Machine Intelligence.
Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., Sana, L., Ahmad, M., & Husen, A. (2019). Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets. International Journal of Advanced Computer Science and Applications, 10, 300–308.
Javed, K., Shengbing, R., Asim, M., & Wani, M. A. (2024, April 10). Cross-Project Defect Prediction Based on Domain Adaptation and LSTM Optimization. Retrieved July 26, 2024, from https://www.mdpi.com/1999-4893/17/5/175
Jin, C. (2021). Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 171, 114637.
Kaope, C., & Pristyanto, Y. (2023). The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance. MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, 22(2), 227–238.
Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of Imbalanced Data:Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. IOP Publishing.
Kumar, P. S., Nayak, J., & Behera, H. S. (2022). Model-based Software Defect Prediction from Software Quality Characterized Code Features by using Stacking Ensemble Learning. Journal of Engineering Science and Technology Review, 15(2), 137–155.
Malhotra, R., Agrawal, V., Pal, V., & Agarwal, T. (2021). Support vector based oversampling technique for handling class imbalance in software defect prediction. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 1078–1083.
Mahesh Kumar Thota, Francis H Shajin, & P. Rajesh. (2020). Survey on software defect prediction techniques. International Journal of Applied Science and Engineering, 17(4).
Mehta, S., & Patnaik, K. S. (2021). Improved prediction of software defects using ensemble machine learning techniques. Neural Computing and Applications, 33(16), 10551–10562.
Mushtaq, Z., Yaqub, A., Sani, S., & Khalid, A. (2020). Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets. Journal of the Chinese Institute of Engineers, 43(1), 80–92. Taylor & Francis.
Nayak, S., Bhat, M., Reddy, N. V. S., & Rao, B. A. (2022). Study of distance metrics on k—Nearest neighbor algorithm for star categorization. Journal of Physics: Conference Series, 2161(1), 012004. IOP Publishing.
Pertiwi, A. G., Bachtiar, N., Kusumaningrum, R., Waspada, I., & Wibowo, A. (2020). Comparison of performance of k-nearest neighbor algorithm using smote and k-nearest neighbor algorithm without smote in diagnosis of diabetes disease in balanced data. Journal of Physics: Conference Series, 1524(1), 012048. IOP Publishing.
Prasetya, J., & Abdurakhman, A. (2023). COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA. MEDIA STATISTIKA, 15(2), 198–208. Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro.
Prusty, S., Patnaik, S., & Dash, S. K. (2022, August 19). Frontiers | SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Retrieved July 26, 2024, from https://www.frontiersin.org/journals/nanotechnology/articles/10.3389/fnano.2022.972421/full
Reddivari, S., & Raman, J. (2019). Software Quality Prediction: An Investigation Based on Machine Learning. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 115–122). Presented at the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA: IEEE. Retrieved August 24, 2024, from https://ieeexplore.ieee.org/document/8843447/
Rehman, H. A. U., Chyi-Yeu Lin, & Zohaib Mushtaq. (2021). Effective K-Nearest Neighbor Algorithms Performance Analysis of Thyroid Disease. Journal of the Chinese Institute of Engineers, 44(1), 77–87.
Sun, B., & Chen, H. (2021). A Survey of k Nearest Neighbor Algorithms for Solving the Class Imbalanced Problem -. Retrieved July 26, 2024, from https://onlinelibrary.wiley.com/doi/10.1155/2021/5520990
Suyanto, S., Yunanto, P. E., Wahyuningrum, T., & Khomsah, S. (2022). A multi-voter multi-commission nearest neighbor classifier. Journal of King Saud University—Computer and Information Sciences, 34(8, Part B), 6292–6302.
Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019). A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1255–1260). Presented at the 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Retrieved July 26, 2024, from https://ieeexplore.ieee.org/document/9065747
Tsalera, E., Papadakis, A., & Samarakou, M. (2020). Monitoring, profiling and classification of urban environmental noise using sound characteristics and the KNN algorithm. Energy Reports, Technologies and Materials for Renewable Energy, Environment and Sustainability, 6, 223–230.
Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), 6256. Nature Publishing Group.
Vandewiele, G., Dehaene, I., Kovács, G., Sterckx, L., Janssens, O., Ongenae, F., De Backere, F., et al. (2021). Overly optimistic prediction results on imbalanced data: A case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 111, 101987.
Zhao, T., Zhang, X., & Wang, S. (2021). GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21 (pp. 833–841). New York, NY, USA: Association for Computing Machinery. Retrieved August 24, 2024, from https://dl.acm.org/doi/10.1145/3437963.3441720
Zhao, Y., Zhu, Y., Yu, Q., & Chen, X. (2021). Cross-Project Defect Prediction Method Based on Manifold Feature Transformation. Future Internet, 13(8), 216. Multidisciplinary Digital Publishing Institute.
DOI: http://dx.doi.org/10.35671/telematika.v18i1.3008
Refbacks
- There are currently no refbacks.
Indexed by:
Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia
This work is licensed under a Creative Commons Attribution 4.0 International License .