Back to Projects
Project Detail

EMR-Based Nursing Surveillance for Automatic ICD Coding

Built an automatic ICD coding pipeline for nursing surveillance of abdominal surgery patients using core EMR data.

Type Clinical AI Research
Year 2025
Primary Role Graduate Researcher
Roles Graduate Researcher, AI Engineer, Data Scientist
Medical AINursing SurveillanceEMRAutomatic ICD CodingKM-BERTXGBoostEnsemble
0.9245 Accuracy0.9157 Weighted F1Strong Rare-class Recall

Context

This project focused on supporting nursing surveillance for abdominal surgery patients through automatic ICD code prediction. Instead of relying on physician narratives or discharge summaries that become available later, the work centered on core EMR data that nurses can access during routine care.

Problem

Nurses continuously monitor patients and identify risks, but the signals needed for diagnosis-related classification are scattered across laboratory results, IO, BST, vital signs, patient information, nursing notes, and PACU records. Existing automatic ICD coding approaches often depend on physician-centered documents or extra resources, which makes them less suitable for direct nursing surveillance support.

Implementation

I worked on integrating heterogeneous EMR sources for 8,587 abdominal surgery patients and structuring them into a usable modeling pipeline. The approach combined two independently trained KM-BERT models, averaged their raw logits for an ensemble effect, reduced the representation with PCA, and used XGBoost as a stacking meta-classifier for the final ICD prediction task. The workflow also addressed class imbalance through stratified splitting and weighted sampling.

Outcome

The final Double KM-BERT + XGBoost + PCA model achieved the best overall performance with 0.9245 accuracy, 0.9107 weighted precision, and 0.9157 weighted F1-score. It also showed strong recall on rare classes, suggesting that meaningful nursing-surveillance-oriented diagnosis classification is possible using only core EMR data available in practice.