Resume

Donghyeon Kim

Data and applied AI engineer connecting data structure, NLP/LLM evaluation, and implementation

Data & Applied AI Engineer Data SystemsAI-DLC / MLOpsNLP / LLM

Location Incheon, South Korea

Email eastlighting1@gachon.ac.kr

Links

GitHub LinkedIn Google Scholar

Projects 7

Publications 3

Training & Certifications 4

Summary

I work across data structure, AI development workflows, and NLP/LLM evaluation, carrying problems from framing to implementation, validation, and delivery.

I turn diverse data surfaces into structures and pipelines that can carry into modeling and operations.
I make AI experiments and operating workflows reproducible and observable.

Research, Programs, and Leadership

Graduate Researcher

Intelligent Data Analytics Lab., Gachon University | 2024.03 - 2026.02

Led graduate research on EMR-based nursing surveillance decision support and diagnostic classification.
Built end-to-end modeling pipelines using KM-BERT ensembles, XGBoost, and both structured and text data.

Research Project Participant

Institute of Information & Communications Technology Planning & Evaluation (IITP) | 2025.09 - 2025.12

Implemented evaluation-related code in a human-centered multimodal AI project.
Bridged evaluation requirements with actual code and reviewable deliverables.

Research Project Participant

National Research Foundation of Korea (NRF) | 2024.03 - 2025.12

Contributed to an NRF-funded clinical AI project centered on nursing surveillance decision support using EMR data.
Implemented workflows for clinical text understanding, including keyword extraction, dependency parsing-based preprocessing, topic modeling, and similarity analysis.

Student Leader and Community Organizer

Gachon University / Notion Community Program | 2019.03 - 2025.02

Held multiple leadership roles in the official university programming club and served as president in 2022.
Planned and led study groups on machine learning, big data, financial ML, and GNNs.

Projects

EMR-Based Nursing Surveillance for Automatic ICD Coding

Clinical AI Research | 2025

Built an automatic ICD coding pipeline for nursing surveillance of abdominal surgery patients using core EMR data.

Reviewed overall behavior and rare-class recall together
Core EMR classification without post-hoc documents

Open project

Contexta: Local-First ML Observability

Self-directed ML Platform Project | 2026

Designed and built Contexta as a local-first ML observability project for collecting, storing, querying, comparing, and recovering machine learning execution records...

로컬 퍼스트 observability 구조 설계
canonical contract 및 workspace 구현

Open project

Lynxes: Graph Analytics Engine

Graph Systems Engine Project | 2026

Designed and implemented Lynxes, an Apache Arrow-based graph analytics engine focused on CSR indexing, lazy execution, and a high-performance graph processing experi...

그래프 엔진 아키텍처 설계
CSR 탐색 구조 구현

Open project

Core Competencies

Data Systems Specialist with Graph-native Depth

Shapes structured, text, image, and graph-shaped data into representations that can move into modeling and operations, w...

Heterogeneous Data · Graph-native Modeling · Pipeline-ready Representation · Data Infrastructure

AI-DLC and Operational MLOps Engineer

Connects experiment, training, evaluation, deployment, inference, observability, and recovery through the AI-Driven Deve...

Experiment Records · Observability · CI/CD and CT · Recovery Flow

Applied NLP and LLM Research Engineer

Uses NLP/LLM research experience for modeling, evaluation design, alignment awareness, and applied system judgment.

Modeling · Evaluation Design · Alignment Awareness · Research-to-System

Research & Publications

A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making

Mathematics (submitted) | 2026 | Journal Paper

This study proposes a hierarchical clinical decision support framework that estimates diagnostic context via partial-label automated ICD coding and reinjec...

Clinical Decision Support System · Automated ICD Coding · ICU Time-series

Open research

Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients

Journal of The Korea Society of Computer and Information | 2025 | Journal Paper

This study proposes an automatic ICD coding model for nursing surveillance of abdominal surgery patients by integrating EMR-based test data, patient inform...

Medical AI · Nursing Surveillance · EMR

Open research

Education

M.S. in Artificial Intelligence

Gachon University | 2024.03 - 2026.02

GPA 4.14/4.5.

B.S. in Software

Gachon University | 2019.03 - 2024.02

Gained leadership and community management experience by participating in an official programming club and leading various study groups, includin...

Donghyeon Kim

Incheon, South Korea / eastlighting1@gachon.ac.kr / GitHub / LinkedIn / Google Scholar

Research Interests

Methods: NLP/LLM evaluation, graph modeling, AI-DLC workflows

Application Areas: Healthcare NLP, recommendation, conversational systems

Systems Focus: Data pipelines, observability, reproducible artifacts

Keywords: Data & Applied AI, Data Systems, AI-DLC, MLOps, Observability, NLP/LLM Evaluation

Objective: Connect data structure, AI development workflows, and NLP/LLM evaluation into implemented work.

Education

Gachon University

M.S. in Artificial Intelligence

2024.03 - 2026.02

GPA 4.14/4.5. Intelligent Data Analytics Lab. Advisor - OkRan Jeong. Focused on clinical AI, evaluation, and implementation-oriented research delivery.

Gachon University

B.S. in Software

2019.03 - 2024.02

Gained leadership and community management experience by participating in an official programming club and leading various study groups, including basic ML, advanced ML, financial ML, and GNN.

Research Experience

Intelligent Data Analytics Lab., Gachon University

Graduate Researcher

2024.03 - 2026.02

Led graduate research on EMR-based nursing surveillance decision support and diagnostic classification.
Built end-to-end modeling pipelines using KM-BERT ensembles, XGBoost, and both structured and text data.
Explored conversational AI topics, including empathetic dialogue generation with AI-based feedback.

Institute of Information & Communications Technology Planning & Evaluation (IITP)

Research Project Participant

2025.09 - 2025.12

Implemented evaluation-related code in a human-centered multimodal AI project.
Bridged evaluation requirements with actual code and reviewable deliverables.

National Research Foundation of Korea (NRF)

Research Project Participant

2024.03 - 2025.12

Contributed to an NRF-funded clinical AI project centered on nursing surveillance decision support using EMR data.
Implemented workflows for clinical text understanding, including keyword extraction, dependency parsing-based preprocessing, topic modeling, and similarity analysis.
Provided interpretable results and web-based analysis interfaces for collaborative researchers.
Achieved 92%+ diagnostic prediction accuracy in a multi-label task by ensembling text and structured data models (KM-BERT, XGBoost).

Gachon University / Notion Community Program

Student Leader and Community Organizer

2019.03 - 2025.02

Held multiple leadership roles in the official university programming club and served as president in 2022.
Planned and led study groups on machine learning, big data, financial ML, and GNNs.
Supported campus learning communities and resource sharing as part of the Notion Campus Leader program (24-2 ~ 25-1).

dktechin

Industry-led Intensive Program Trainee

2024.01 - 2024.02

Participated in an industry-led intensive program focused on cloud, CI/CD, security, and DevOps practices.
Contributed to implementation tasks with a focus on security and DevOps roles in project-based training.

Publications

Kim, Donghyeon; Kim, Daeho; Jeong, Okran , A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making Mathematics, 2026, Submitted / under review
Kim, Donghyeon; Kim, Daeho; Kim, Seyoung; Jeong, Okran , Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients Journal of The Korea Society of Computer and Information, vol. 30, no. 5, pp. 21-30, 2025, The Korean Society Of Computer And Information
Joo, Yongwan; Lim, Donghyun; Kim, Donghyeon; Sun, Seungyeon; Jeong, Okran , Empathetic Dialogue Generation Model Using Reinforcement Learning with AI-Based Feedback Proceedings of the Korea Computer Congress 2025, pp. 2410-2412, 2025

Projects

EMR-Based Nursing Surveillance for Automatic ICD Coding

Clinical AI Research

2025

Nursing surveillance required diagnosis-related classification, but key clinical signals were fragmented across heterogeneous EMR sources.
Trained two KM-BERT models independently and averaged raw logits to stabilize text representation.
The final model was reviewed for practical classification behavior rather than a single standalone score, including rare-class recall and available-data constraints.

Evaluation Scope Balanced · Available Data Core EMR · Rare-class Recall High

Contexta: Local-First ML Observability

Self-directed ML Platform Project

2026

ML experiments and deployment work often scatter metadata, records, and artifacts across tools, making reproducible local observability hard to maintain.
Used a `.contexta/` workspace as the home for separated metadata, records, and artifact storage.
Implemented a local observability foundation for consistently managing and inspecting ML execution history and artifacts.

Architecture Local-first · Contract Schema-first · Workflow Trace / Compare / Recover

Lynxes: Graph Analytics Engine

Graph Systems Engine Project

2026

Existing Python graph libraries and generic dataframe wrappers often struggle to combine memory efficiency, traversal performance, and lazy query optimization for large graph analytics.
Designed GraphFrame to own Arrow RecordBatches directly.
Established the foundation for a graph analytics engine with Arrow columnar memory, CSR-based traversal, and lazy collect execution.

Neighbor lookup O(degree) · Memory model Arrow · Execution Lazy collect

FRIMO: Conversational AI for Emotional Support and Diary Generation

Conversational AI Project

2023

The product needed ML components that could recognize user emotion and support a diary-generation workflow from daily conversation logs.
Centered the ML workflow around KoBERT-based emotion recognition and connected it to chatbot and summarization flows.
The project delivered an MVP-level conversational diary experience with an emotion recognition pipeline.

Pipeline Korean NLP

Devridge: LLM-Based Feedback Bridge for Developers

LLM Application Prototype

2024

Developers working alone often need UI, performance, or code quality feedback, but they rarely have an easy way to gather role-specific input at the right time.
Separated role constraints and contextual input so responses stayed scoped instead of generic.
The prototype showed how structured prompting can make LLM feedback more useful for project review.

Feedback modes Role-based

BloGeek: AI Modules for a React + Spring Blog Project

Collaborative NLP Project

2023

The product needed ML components that could classify emotional polarity and generate stylistic variations of text to support richer blog content workflows.
Used KoBERT for polarity recognition and KoBART for style transfer.
The project gave the team practical AI modules for blog-oriented text processing.

Model scope Classification + Generation

Dalkom Shop: Internal Employee Mileage Commerce Platform

Internal Commerce Platform

2024

The project needed cloud infrastructure and platform foundations that could reliably support search, notifications, admin workflows, and ongoing operations.
Organized CI/CD and cloud foundations so the React and Spring service could fit into a deployable, monitorable workflow.
Built a practical service foundation for a closed internal mileage commerce platform.

Foundation CI/CD + Cloud

Training & Certifications

Practical Implementation of Monitoring and Testing in DevOps Environments

LLOYDK

2023.11

Completed practical training in Elastic-based DevOps monitoring and testing.

Multi Cloud Orchestration Program

5Works

2023.12

Completed HashiCorp-based multi-cloud orchestration and IaC training.

Company-Led Intensive Project Training

DK Techin

2024.02

Participated in industry-linked practical training focused on security and DevOps engineering.

Micro Degree in Software Specialist Training

Gachon University

2024.02

Completed a micro-degree program for training software specialists.

Additional Information

Tech Stack: Python · PyTorch · HuggingFace · Docker · AWS

Focus Areas: Data Systems · AI-DLC/MLOps · NLP/LLM

Technical Strengths: Graph-based Data Systems · ML Observability · NLP/LLM Evaluation · Code Reproducibility · Automation