Web
Analytics

Feature Selection and Class Imbalance Machine Learning for Early Detection of Thyroid Cancer Recurrence: A Performance-Based Analysis

  Agus Wantoro (1*), Wahyu Caesarendra (2), Admi Syarif (3), Hari Soetanto (4)

(1) Universitas Aisyah Pringsewu - Indonesia - [ https://scholar.google.com/citations?user=MaJqcIAAAAAJ&hl=id ]
(2) Curtin University Malaysia - Malaysia
(3) Universitas Lampung - Indonesia
(4) Universitas Budi Luhur - Indonesia
(*) Corresponding Author

Received: June 27, 2025; Revised: September 25, 2025
Accepted: October 16, 2025; Published: December 31, 2025


How to cite (IEEE): A. Wantoro, W. Caesarendra, A. Syarif,  and H. Soetanto, "Feature Selection and Class Imbalance Machine Learning for Early Detection of Thyroid Cancer Recurrence: A Performance-Based Analysis," Jurnal Elektronika dan Telekomunikasi, vol. 25, no. 2, pp. 93 - 101, Dec. 2025. doi: 10.55981/jet.758

Abstract

Early detection of thyroid cancer recurrence is a crucial factor in patient survival and treatment effectiveness. Misdetection results in disease severity, high cost, recovery time, and decreased service quality. In addition, the main challenges in developing a Machine Learning (ML)-based detection decision support system are class imbalance in medical data and high feature dimensions that can affect model accuracy and efficiency. This study proposes a feature selection-based approach and class imbalance handling to improve the performance of early detection of Thyroid cancer. Several feature selection techniques, such as Information Gain (IG), Gain Ratio (GR), Gini Decrease (GD), and Chi-Square (CS), can select features based on weighted ranking. In addition, to overcome the imbalanced class distribution, we use the Synthetic Minority Over-Sampling Technique (SMOTE). ML classification models such as k-NN, Tree, SVM, Naive Bayes, AdaBoost, Neural Network (NN), and Logistic Regression (LR) are tested and evaluated based on a confusion matrix, including accuracy, precision, recall, time, and log loss. Experimental results show that the combination of imbalanced class handling strategies significantly improves the prediction performance of ML algorithms. In addition, we found that the combination of CS+NN feature selection techniques consistently showed optimal performance. This study emphasizes the importance of data pre-processing and proper algorithm selection in the development of a machine learning-based thyroid cancer detection system.


  http://dx.doi.org/10.55981/jet.758

Keywords


Class imbalance, Feature selection, Machine Learning, Thyroid cancer.

Full Text:

  PDF

References


A. Schindele et al., “Interpretable machine learning for thyroid cancer recurrence predicton: Leveraging XGBoost and SHAP analysis,” Eur. J. Radiol., vol. 186, May 2025, doi: 10.1016/j.ejrad.2025.112049. Crossref

A. H. Barfejani et al., “Predicting overall survival in anaplastic thyroid cancer using machine learning approaches,” Eur. Arch. Oto-Rhino-Laryngology, vol. 282, no. 3, pp. 1653–1657, 2025, doi: 10.1007/s00405-024-08986-2. Crossref

D. W. Chen, B. H. H. Lang, D. S. A. McLeod, K. Newbold, and M. R. Haymart, “Thyroid cancer,” Lancet, vol. 401, no. 10387, pp. 1531–1544, May 2023, doi: 10.1016/S0140-6736(23)00020-X. Crossref

A. Kuang, V. L. Kouznetsova, S. Kesari, and I. F. Tsigelny, “Diagnostics of Thyroid Cancer Using Machine Learning and Metabolomics,” Metabolites, vol. 14, no. 1, 2024, doi: 10.3390/metabo14010011. Crossref

R. Iacob et al., “Evaluating the Role of Breast Ultrasound in Early Detection of Breast Cancer in Low- and Middle-Income Countries: A Comprehensive Narrative Review,” Bioengineering, vol. 11, no. 3. 2024. doi: 10.3390/bioengineering11030262. Crossref

Y.-M. Huang et al., “Correction: Huang et al. Systemic Anticoagulation and Inpatient Outcomes of Pancreatic Cancer: Real-World Evidence from U.S. Nationwide Inpatient Sample. Cancers 2023, 15, 1985,” Cancers, vol. 16, no. 6. 2024. doi: 10.3390/cancers16061181. Crossref

I. O. Lixandru-Petre et al., “Machine Learning for Thyroid Cancer Detection, Presence of Metastasis, and Recurrence Predictions—A Scoping Review,” Cancers (Basel)., vol. 17, no. 8, pp. 1–27, 2025, doi: 10.3390/cancers17081308. Crossref

S. Li, Z. Tang, L. Yang, M. Li, and Z. Shang, “Application of deep reinforcement learning for spike sorting under multi-class imbalance,” Comput. Biol. Med., vol. 164, p. 107253, 2023, doi: https://doi.org/10.1016/j.compbiomed.2023.107253. Crossref

X. Song et al., “Evolutionary computation for feature selection in classification: A comprehensive survey of solutions, applications and challenges,” Swarm Evol. Comput., vol. 90, p. 101661, 2024, doi: https://doi.org/10.1016/j.swevo.2024.101661. Crossref

W. Chen, K. Yang, Z. Yu, Y. Shi, and C. L. P. Chen, “A survey on imbalanced learning: latest research, applications and future directions,” Artif. Intell. Rev., vol. 57, no. 6, p. 137, 2024, doi: 10.1007/s10462-024-10759-6. Crossref

L. C. M. Liaw, S. C. Tan, P. Y. Goh, and C. P. Lim, “A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification,” Inf. Sci. (Ny)., vol. 686, p. 121193, 2025, doi: https://doi.org/10.1016/j.ins.2024.121193. Crossref

K. E. Setiawan, “Predicting Recurrence in Differentiated Thyroid Cancer: a Comparative Analysis of Various Machine Learning Models Including Ensemble Methods With Chi-Squared Feature Selection,” Commun. Math. Biol. Neurosci., vol. 2024, no. Scenario 1, pp. 1–29, 2024, doi: 10.28919/cmbn/8506. Crossref

G. Husain et al., “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models,” Algorithms, vol. 18, no. 1, pp. 1–16, 2025, doi: 10.3390/a18010037. Crossref

M. F. Ijaz, G. Alfian, M. Syafrudin, and J. Rhee, “Hybrid Prediction Model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest,” Appl. Sci., vol. 8, no. 8, 2018, doi: 10.3390/app8081325. Crossref

H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717. Crossref

J. Wang, S. Zhou, Y. Yi, and J. Kong, “An improved feature selection based on effective range for classification,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/972125. Crossref

S. Bashir, Z. S. Khan, F. H. Khan, A. Anjum, and K. Bashir, “Improving Heart Disease Prediction Using Feature Selection Approaches,” in 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2019, pp. 619–623. doi: 10.1109/IBCAST.2019.8667106. Crossref

J. Gao, Z. Wang, T. Jin, J. Cheng, Z. Lei, and S. Gao, “Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection,” Knowledge-Based Syst., vol. 286, p. 111380, 2024, doi: https://doi.org/10.1016/j.knosys.2024.111380. Crossref

P. Bhat and K. Dutta, “A multi-tiered feature selection model for android malware detection based on Feature discrimination and Information Gain,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 10, Part B, pp. 9464–9477, 2022, doi: https://doi.org/10.1016/j.jksuci.2021.11.004. Crossref

M. Trabelsi, N. Meddouri, and M. Maddouri, “A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis,” Procedia Comput. Sci., vol. 112, pp. 186–194, 2017, doi: 10.1016/j.procs.2017.08.227. Crossref

Y. Sang and X. Dang, “Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation,” J. Multivar. Anal., vol. 204, pp. 1–25, 2024, doi: 10.1016/j.jmva.2024.105360. Crossref

Y. Zhang et al., “Feature selection based on neighborhood rough sets and Gini index,” PeerJ Comput. Sci., vol. 9, p. e1711, 2023, doi: 10.7717/peerj-cs.1711. Crossref

A. Abdo, R. Mostafa, and L. Abdel-Hamid, “An Optimized Hybrid Approach for Feature Selection Based on Chi-Square and Particle Swarm Optimization Algorithms,” Data, vol. 9, no. 2. 2024. doi: 10.3390/data9020020. Crossref

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002. Crossref

M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249. Crossref

I. Popchev and D. Orozova, “Algorithms for Machine Learning with Orange System,” Int. J. online Biomed. Eng., vol. 19, no. 4, pp. 109–123, 2023, doi: 10.3991/ijoe.v19i04.36897. Crossref

F. Miao, Y. Wu, G. Yan, and X. Si, “Dynamic multi-swarm whale optimization algorithm based on elite tuning for high-dimensional feature selection classification problems,” Appl. Soft Comput., vol. 169, p. 112634, 2025, doi: https://doi.org/10.1016/j.asoc.2024.112634. Crossref


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Refbacks

  • There are currently no refbacks.




Copyright (c) 2025 National Research and Innovation Agency

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.