Summary
This study addresses automatic diagnosis-code classification for nursing surveillance in abdominal surgery patients. It focuses on predicting ICD codes using core EMR data that are directly available during nursing practice, rather than relying on physician narratives or discharge summaries created later in the care process.
Why It Matters
Nursing surveillance is important for patient safety and clinical outcomes, but the volume and complexity of EMR data make timely diagnosis-related classification difficult. Since many previous ICD coding approaches depended on physician-centered documents or additional records, this work is meaningful because it demonstrates clinically useful performance using only the EMR signals that nurses can access within routine workflows.
Contribution
The study integrated test results, IO, BST, vital signs, patient information, nursing notes, and PACU records, then built a stacking framework that averages the outputs of two KM-BERT models, applies PCA for dimensionality reduction, and uses XGBoost as a meta-classifier for final ICD prediction. The proposed Double KM-BERT + XGBoost + PCA model achieved the best results with 0.9245 accuracy, 0.9107 weighted precision, and 0.9157 weighted F1-score, while also showing strong recall on rare classes.