Donghyeon Kim
Data and applied AI engineer connecting data structure, NLP/LLM evaluation, and implementation
Summary
I work across data structure, AI development workflows, and NLP/LLM evaluation, carrying problems from framing to implementation, validation, and delivery.
- I turn diverse data surfaces into structures and pipelines that can carry into modeling and operations.
- I make AI experiments and operating workflows reproducible and observable.
Research, Programs, and Leadership
- Led graduate research on EMR-based nursing surveillance decision support and diagnostic classification.
- Built end-to-end modeling pipelines using KM-BERT ensembles, XGBoost, and both structured and text data.
- Implemented evaluation-related code in a human-centered multimodal AI project.
- Bridged evaluation requirements with actual code and reviewable deliverables.
- Contributed to an NRF-funded clinical AI project centered on nursing surveillance decision support using EMR data.
- Implemented workflows for clinical text understanding, including keyword extraction, dependency parsing-based preprocessing, topic modeling, and similarity analysis.
- Held multiple leadership roles in the official university programming club and served as president in 2022.
- Planned and led study groups on machine learning, big data, financial ML, and GNNs.
Projects
Built an automatic ICD coding pipeline for nursing surveillance of abdominal surgery patients using core EMR data.
- Reviewed overall behavior and rare-class recall together
- Core EMR classification without post-hoc documents
Designed and built Contexta as a local-first ML observability project for collecting, storing, querying, comparing, and recovering machine learning execution records...
- 로컬 퍼스트 observability 구조 설계
- canonical contract 및 workspace 구현
Designed and implemented Lynxes, an Apache Arrow-based graph analytics engine focused on CSR indexing, lazy execution, and a high-performance graph processing experi...
- 그래프 엔진 아키텍처 설계
- CSR 탐색 구조 구현
Donghyeon Kim
Research Interests
Methods: NLP/LLM evaluation, graph modeling, AI-DLC workflows
Application Areas: Healthcare NLP, recommendation, conversational systems
Systems Focus: Data pipelines, observability, reproducible artifacts
Keywords: Data & Applied AI, Data Systems, AI-DLC, MLOps, Observability, NLP/LLM Evaluation
Objective: Connect data structure, AI development workflows, and NLP/LLM evaluation into implemented work.
Education
Gachon University
M.S. in Artificial Intelligence
GPA 4.14/4.5. Intelligent Data Analytics Lab. Advisor - OkRan Jeong. Focused on clinical AI, evaluation, and implementation-oriented research delivery.
Gachon University
B.S. in Software
Gained leadership and community management experience by participating in an official programming club and leading various study groups, including basic ML, advanced ML, financial ML, and GNN.
Research Experience
Intelligent Data Analytics Lab., Gachon University
Graduate Researcher
- Led graduate research on EMR-based nursing surveillance decision support and diagnostic classification.
- Built end-to-end modeling pipelines using KM-BERT ensembles, XGBoost, and both structured and text data.
- Explored conversational AI topics, including empathetic dialogue generation with AI-based feedback.
Institute of Information & Communications Technology Planning & Evaluation (IITP)
Research Project Participant
- Implemented evaluation-related code in a human-centered multimodal AI project.
- Bridged evaluation requirements with actual code and reviewable deliverables.
National Research Foundation of Korea (NRF)
Research Project Participant
- Contributed to an NRF-funded clinical AI project centered on nursing surveillance decision support using EMR data.
- Implemented workflows for clinical text understanding, including keyword extraction, dependency parsing-based preprocessing, topic modeling, and similarity analysis.
- Provided interpretable results and web-based analysis interfaces for collaborative researchers.
- Achieved 92%+ diagnostic prediction accuracy in a multi-label task by ensembling text and structured data models (KM-BERT, XGBoost).
Gachon University / Notion Community Program
Student Leader and Community Organizer
- Held multiple leadership roles in the official university programming club and served as president in 2022.
- Planned and led study groups on machine learning, big data, financial ML, and GNNs.
- Supported campus learning communities and resource sharing as part of the Notion Campus Leader program (24-2 ~ 25-1).
dktechin
Industry-led Intensive Program Trainee
- Participated in an industry-led intensive program focused on cloud, CI/CD, security, and DevOps practices.
- Contributed to implementation tasks with a focus on security and DevOps roles in project-based training.
Publications
- Kim, Donghyeon; Kim, Daeho; Jeong, Okran , A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making
- Kim, Donghyeon; Kim, Daeho; Kim, Seyoung; Jeong, Okran , Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients
- Joo, Yongwan; Lim, Donghyun; Kim, Donghyeon; Sun, Seungyeon; Jeong, Okran , Empathetic Dialogue Generation Model Using Reinforcement Learning with AI-Based Feedback
Projects
EMR-Based Nursing Surveillance for Automatic ICD Coding
Clinical AI Research
- Nursing surveillance required diagnosis-related classification, but key clinical signals were fragmented across heterogeneous EMR sources.
- Trained two KM-BERT models independently and averaged raw logits to stabilize text representation.
- The final model was reviewed for practical classification behavior rather than a single standalone score, including rare-class recall and available-data constraints.
Evaluation Scope Balanced · Available Data Core EMR · Rare-class Recall High
Contexta: Local-First ML Observability
Self-directed ML Platform Project
- ML experiments and deployment work often scatter metadata, records, and artifacts across tools, making reproducible local observability hard to maintain.
- Used a `.contexta/` workspace as the home for separated metadata, records, and artifact storage.
- Implemented a local observability foundation for consistently managing and inspecting ML execution history and artifacts.
Architecture Local-first · Contract Schema-first · Workflow Trace / Compare / Recover
Lynxes: Graph Analytics Engine
Graph Systems Engine Project
- Existing Python graph libraries and generic dataframe wrappers often struggle to combine memory efficiency, traversal performance, and lazy query optimization for large graph analytics.
- Designed GraphFrame to own Arrow RecordBatches directly.
- Established the foundation for a graph analytics engine with Arrow columnar memory, CSR-based traversal, and lazy collect execution.
Neighbor lookup O(degree) · Memory model Arrow · Execution Lazy collect
FRIMO: Conversational AI for Emotional Support and Diary Generation
Conversational AI Project
- The product needed ML components that could recognize user emotion and support a diary-generation workflow from daily conversation logs.
- Centered the ML workflow around KoBERT-based emotion recognition and connected it to chatbot and summarization flows.
- The project delivered an MVP-level conversational diary experience with an emotion recognition pipeline.
Pipeline Korean NLP
Devridge: LLM-Based Feedback Bridge for Developers
LLM Application Prototype
- Developers working alone often need UI, performance, or code quality feedback, but they rarely have an easy way to gather role-specific input at the right time.
- Separated role constraints and contextual input so responses stayed scoped instead of generic.
- The prototype showed how structured prompting can make LLM feedback more useful for project review.
Feedback modes Role-based
BloGeek: AI Modules for a React + Spring Blog Project
Collaborative NLP Project
- The product needed ML components that could classify emotional polarity and generate stylistic variations of text to support richer blog content workflows.
- Used KoBERT for polarity recognition and KoBART for style transfer.
- The project gave the team practical AI modules for blog-oriented text processing.
Model scope Classification + Generation
Dalkom Shop: Internal Employee Mileage Commerce Platform
Internal Commerce Platform
- The project needed cloud infrastructure and platform foundations that could reliably support search, notifications, admin workflows, and ongoing operations.
- Organized CI/CD and cloud foundations so the React and Spring service could fit into a deployable, monitorable workflow.
- Built a practical service foundation for a closed internal mileage commerce platform.
Foundation CI/CD + Cloud
Training & Certifications
Practical Implementation of Monitoring and Testing in DevOps Environments
LLOYDK
Completed practical training in Elastic-based DevOps monitoring and testing.
Multi Cloud Orchestration Program
5Works
Completed HashiCorp-based multi-cloud orchestration and IaC training.
Company-Led Intensive Project Training
DK Techin
Participated in industry-linked practical training focused on security and DevOps engineering.
Micro Degree in Software Specialist Training
Gachon University
Completed a micro-degree program for training software specialists.
Additional Information
Tech Stack: Python · PyTorch · HuggingFace · Docker · AWS
Focus Areas: Data Systems · AI-DLC/MLOps · NLP/LLM
Technical Strengths: Graph-based Data Systems · ML Observability · NLP/LLM Evaluation · Code Reproducibility · Automation