Featured Application
A computer-aided prognosis system for cardiovascular diseases can be a valuable tool for primary health care. Even users without medical expertise can utilize such tools to screen cardiovascular disease risk early, hence decongesting the National Healthcare Service.
Abstract
Coronary artery disease (CAD) remains a leading cause of morbidity and mortality worldwide, emphasizing the need for early and scalable risk stratification approaches. While recent Machine Learning (ML) studies have reported high diagnostic performance using multimodal clinical, laboratory, imaging, and genetic data, they do not provide early screening or prognosis. In this study, we investigate the extent to which CAD prognosis can be achieved using lifestyle and medical history variables alone. We pooled a cohort of 571 participants with and without CAD and evaluated multiple ML models, including Random Forest, CatBoost, AdaBoost, XGBoost, TabPFN, and k-Nearest Neighbors, using 10-fold cross-validation. Across models, predictive performance converged in a narrow range (72–76% accuracy), with the best-performing models reaching approximately 76% accuracy, compared to a clinician baseline of 78.8%. To enhance transparency and clinical interpretability, we further outline an explainability analysis for the top-performing model using SHAP-based approaches. Overall, this work highlights both the potential and the limitations of lifestyle-based ML models for CAD prognosis and supports their role as complementary tools for early screening and preventive cardiology.
