ML Workflow in Risk Classifying Occlusion Myocardial Infarction

Technical skilss involved in this project:

Programming: MATLAB, Python

Software Libraries: Numpy, Pandas, PyQt5, SciKit-Learn

Machine Learning Architectures: ANN, GBM, KNN, RF, XGB, SVM, Logistic Regression, LDA, SGD Logistic, Naive Bayes (Gaussian)

Soft Skills: Communication with physicians, presentations

RESULTS

PROJECT SUMMARY

This project is my 4th year thesis in colloboration with phd student, professors from University of Toronto ECE department and North York General Hospital physicians.

Accurate and timely detection of Occlusion Myocardial Infarction (OMI) in Emergency Department (ED) settings remains a clinical challenge, as traditional ECG-based criteria often fail to capture subtle and transient ischemic changes. This thesis presents a machine learning (ML)-based approach for OMI classification, building on the BRAVEHEART software platform to extract 73 ECG-derived features, including complex measures such as mechanical dispersion, non-dipolar voltages, the Selvester score and PR interval. Ten ML algorithms were evaluated using data from the Pittsburgh and Orange County datasets. Class imbalance was addressed through weighted loss functions, and performance was assessed via AUROC, sensitivity, specificity, PPV, and NPV. The best-performing model (XGB) achieved an AUROC of 0.894 on the external Orange County dataset, outperforming the reference study’s random forest model. Optimized thresholding further enabled stratification into low, intermediate, and high-risk groups, with improved PPV for high-risk patients. Feature distribution analysis showed that only one-third of the features were statistically similar across datasets; however, stable inter-feature correlations enabled consistent performance. The resulting model and pipeline are designed for deployment as a clinical decision support tool within ED workflows, helping physicians triage patients more effectively and expedite treatment for high-risk OMI cases.

METHODS AND PIPELINE

This thesis project have three major components. The first part is based on existing open source project BRAVEHEART which utilizes VectorCardiogram (VCG) and advanced signal processing. Around 33/73 features are already computed by BRAVEHEART (most of these are common features such as QRS interval), the remaining features are compledted by heavily creating custom functions in MATLAB.

Second part of the pipeline is training 10 machine learning models: ANN, GBM, KNN, RF, XGB, SVM, Logistic Regression, LDA, SGD Logistic, Naive Bayes (Gaussian) on Pittsburgh ambulance data.

Third part of the pipeline is mainly about creating a integrated windows application wit PyQt5 that uses MATLAB for preprocessing features, best machine learning model obtained from second part and have a clear UI to help physicians to make a decision.

PROJECT CONCLUSION

This thesis project designed and evaluated a clinical ML workflow for automated ECG-based detection of Occlusion Myocardial Infarction; achieved 87.6% sensitivity, 98.9% NPV (rule-out), and 97.3% specificity on external testing dataset