Optimized for browser print and PDF export.
SYS_SHEET_01 / COVER

Print Portfolio

Web Portfolio

Portfolio for Technical Review

Data & Applied AI Engineer

Data & Applied AI Engineer

I work across data structure, AI development workflows, and NLP/LLM evaluation, carrying problems from framing to implementation, validation, and delivery.

Feedback & Data Loop Realtime Serving & Observability Data Orchestration NLP / LLM Research Data Systems AI-DLC / MLOps

[SYS_ROLE: EXECUTIVE_SUMMARY] · Role Summary

Data Systems Data Systems Specialist with Graph-native Depth

Turns heterogeneous data into model-ready and system-ready structures, pipelines, and graph representations.

AI-DLC / MLOps AI-DLC and Operational MLOps Engineer

Makes run records, artifacts, model behavior, and feedback paths inspectable enough to improve.

NLP / LLM Applied NLP and LLM Research Engineer

Translates language-model research into better modeling and evaluation decisions in applied systems.

[SYS_EVIDENCE: VERIFICATION_MATRIX]

Case Study / Evidence Data Systems AI-DLC / MLOps NLP / LLM
Featured Case Studies
2026 / Self-directed ML Platform Project Contexta: Local-First ML Observability Supporting Primary
2026 / Graph Systems Engine Project Lynxes: Graph Analytics Engine Primary Supporting
2025 / Clinical AI Research EMR-Based Nursing Surveillance for Automatic ICD Coding Supporting Primary
2024 / Internal Commerce Platform Dalkom Shop: Internal Employee Mileage Commerce Platform Supporting Primary
2024 / LLM Application Prototype Devridge: LLM-Based Feedback Bridge for Developers Supporting Primary
2023 / Collaborative NLP Project BloGeek: AI Modules for a React + Spring Blog Project Supporting Primary
2023 / Conversational AI Project FRIMO: Conversational AI for Emotional Support and Diary Generation Supporting Primary
Research Publications
2026 / Research A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making Supporting Primary
2025 / Research Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients Supporting Primary
2025 / Research Empathetic Dialogue Generation Model Using Reinforcement Learning with AI-Based Feedback Primary
Capability Study Groups
2023 / Study NEKA: NLP 스터디 (KoBERT) Primary
2023 / Study Sunaroum: Financial AI Study Supporting Primary
2022 / Study Ida: Recommender Systems Study Primary
2022 / Study MLADS: 머신러닝 기초 스터디 Primary Supporting
Email eastlighting1@gachon.ac.kr
GitHub github.com/eastlighting1
LinkedIn www.linkedin.com/in/동현-김-350b4b29b
SYS_SHEET_02 / CASE_STUDY

01 / Self-directed ML Platform Project / 2026

Contexta: Local-First ML Observability

A local-first observability library for tracing, comparing, and recovering ML execution history through one consistent contract.

Built project

[SYS_PIPELINE: TIMELINE]

WARN Problem

ML experiments and deployment work often scatter metadata, records, and artifacts across tools, making reproducible local observability hard to maintain.

EXEC Key Decision

Used a `.contexta/` workspace as the home for separated metadata, records, and artifact storage.

OK Outcome

Implemented a local observability foundation for consistently managing and inspecting ML execution history and artifacts.

[SYS_FLOW: ARCHITECTURE]

A local-first ML observability structure where the Python facade, CLI, workspace storage, and report/recovery flows share one contract.

Python API CLI Canonical Contract Local Workspace Reports & Recovery

[SYS_SPEC: CONSTRAINTS] · Constraints

  • The system had to work from a local workspace without an external backend.
  • The library API and CLI needed to share the same contract.
  • Execution records, artifacts, comparison reports, and recovery flows had to feel like one product.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Canonical workspace

Used a `.contexta/` workspace as the home for separated metadata, records, and artifact storage.

Rationale: This made execution history inspectable through ordinary local files.
EXEC
Schema-first contract

Made capture, query, compare, and recovery flows share the same data meaning.

Rationale: The contract keeps future features from drifting into incompatible data shapes.

This project reinforced that ML tooling becomes product-like when execution traces remain explainable after the run finishes.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Supporting Data Systems
Primary Focus AI-DLC / MLOps
NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

Local-first · Architecture
Schema-first · Contract
Trace / Compare / Recover · Workflow

Shows operational observability design for tracing, comparing, and recovering AI execution records and artifacts.

[SYS_SPEC: METADATA]

Data Surfaces
structuredhybrid
AI-DLC Stages
dataexperimentevaluationobservabilityfeedback-recovery
SYS_SHEET_03 / CASE_STUDY

02 / Graph Systems Engine Project / 2026

Lynxes: Graph Analytics Engine

A high-performance graph analytics engine that combines Arrow columnar memory with graph-native traversal structures for Python users.

Built project

[SYS_PIPELINE: TIMELINE]

WARN Problem

Existing Python graph libraries and generic dataframe wrappers often struggle to combine memory efficiency, traversal performance, and lazy query optimization for large graph analytics.

EXEC Key Decision

Designed GraphFrame to own Arrow RecordBatches directly.

OK Outcome

Established the foundation for a graph analytics engine with Arrow columnar memory, CSR-based traversal, and lazy collect execution.

[SYS_FLOW: ARCHITECTURE]

A graph analytics engine combining Arrow RecordBatch storage, CSR indexing, lazy logical plans, Rust algorithms, and Python bindings.

Arrow RecordBatch GraphFrame CSR Index LogicalPlan Python Bindings

[SYS_SPEC: CONSTRAINTS] · Constraints

  • Node and edge data needed a column-oriented memory layout.
  • Neighbor traversal had to avoid linear scans.
  • The Rust engine and Python usability layer needed a stable boundary.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Arrow-owned GraphFrame

Designed GraphFrame to own Arrow RecordBatches directly.

Rationale: This combines columnar memory efficiency with compatibility potential for Python analytics workflows.
EXEC
CSR adjacency index

Added CSR-based adjacency indexing to EdgeFrame so neighbor lookup becomes O(degree).

Rationale: Traversal should behave like a graph-native operation rather than a generic table scan.
EXEC
Lazy logical plan

Accumulated queries into a LogicalPlan and deferred execution until `.collect()`.

Rationale: This keeps the analysis pipeline optimizable.

The project clarified that system performance becomes convincing when data ownership, memory layout, and user API design work together.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Primary Focus Data Systems
Supporting AI-DLC / MLOps
NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

O(degree) · Neighbor lookup
Arrow · Memory model
Lazy collect · Execution

Shows system design and implementation depth for treating graph data as a first-class execution model.

[SYS_SPEC: METADATA]

Data Surfaces
structuredgraphhybrid
AI-DLC Stages
dataexperimentevaluation
SYS_SHEET_04 / CASE_STUDY

03 / Clinical AI Research / 2025

EMR-Based Nursing Surveillance for Automatic ICD Coding

A clinical AI study showing that core EMR data available during nursing work can support practical diagnosis-related classification.

Published work Summary available

[SYS_PIPELINE: TIMELINE]

WARN Problem

Nursing surveillance required diagnosis-related classification, but key clinical signals were fragmented across heterogeneous EMR sources.

EXEC Key Decision

Trained two KM-BERT models independently and averaged raw logits to stabilize text representation.

OK Outcome

The final model was reviewed for practical classification behavior rather than a single standalone score, including rare-class recall and available-data constraints.

[SYS_FLOW: ARCHITECTURE]

A clinical AI pipeline that processes structured EMR and nursing text in parallel before stacking them for ICD prediction.

Structured EMR Nursing Text Dual KM-BERT PCA + XGBoost Rare-class Evaluation

[SYS_SPEC: CONSTRAINTS] · Constraints

  • The model could not depend on discharge summaries or other post-event documents.
  • Rare classes still needed practically meaningful recall.
  • The pipeline had to combine structured EMR features with Korean clinical text.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Dual KM-BERT representation

Trained two KM-BERT models independently and averaged raw logits to stabilize text representation.

Rationale: The ensemble reduced volatility in clinical text signals.
EXEC
PCA and XGBoost stacking

Reduced BERT-derived representations with PCA and used XGBoost as the final ICD classifier.

Rationale: This made high-dimensional text signals and structured EMR features easier to combine.

The work clarified that clinical AI value depends not only on performance, but on which documents and signals are realistically available at the time of use.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Supporting Data Systems
AI-DLC / MLOps
Primary Focus NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

Balanced · Evaluation Scope
Core EMR · Available Data
High · Rare-class Recall

Combines heterogeneous structured EMR and Korean clinical text into an evaluable NLP pipeline.

[SYS_SPEC: METADATA]

Data Surfaces
structuredtexthybrid
AI-DLC Stages
datatrainingevaluation

[SYS_LINK: ARTIFACTS]

SYS_SHEET_05 / CASE_STUDY

04 / Internal Commerce Platform / 2024

Dalkom Shop: Internal Employee Mileage Commerce Platform

A DevSecOps project that shaped the deployment, operations, security, and observability foundation behind an internal commerce platform.

Built project Summary available

[SYS_PIPELINE: TIMELINE]

WARN Problem

The project needed cloud infrastructure and platform foundations that could reliably support search, notifications, admin workflows, and ongoing operations.

EXEC Key Decision

Organized CI/CD and cloud foundations so the React and Spring service could fit into a deployable, monitorable workflow.

OK Outcome

Built a practical service foundation for a closed internal mileage commerce platform.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Platform foundation

Organized CI/CD and cloud foundations so the React and Spring service could fit into a deployable, monitorable workflow.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Supporting Data Systems
Primary Focus AI-DLC / MLOps
NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

CI/CD + Cloud · Foundation

Shows delivery, security, and observability foundations for running service features in an operational environment.

[SYS_SPEC: METADATA]

Data Surfaces
structuredimagehybrid
AI-DLC Stages
datadeploymentobservabilityfeedback-recovery
SYS_SHEET_06 / CASE_STUDY

05 / LLM Application Prototype / 2024

Devridge: LLM-Based Feedback Bridge for Developers

A prototype that turns LLM feedback into more practical development review through role constraints and structured input.

Prototype Summary available

[SYS_PIPELINE: TIMELINE]

WARN Problem

Developers working alone often need UI, performance, or code quality feedback, but they rarely have an easy way to gather role-specific input at the right time.

EXEC Key Decision

Separated role constraints and contextual input so responses stayed scoped instead of generic.

OK Outcome

The prototype showed how structured prompting can make LLM feedback more useful for project review.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Role-aware prompting

Separated role constraints and contextual input so responses stayed scoped instead of generic.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Data Systems
Supporting AI-DLC / MLOps
Primary Focus NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

Role-based · Feedback modes

Shows prompt and interaction design for making LLM output useful in role-based technical review.

[SYS_SPEC: METADATA]

Data Surfaces
text
AI-DLC Stages
dataexperimentinferenceevaluation
SYS_SHEET_07 / CASE_STUDY

06 / Collaborative NLP Project / 2023

BloGeek: AI Modules for a React + Spring Blog Project

A Korean NLP project connecting emotion classification and style transfer models to a blog product workflow.

Built project Summary available

[SYS_PIPELINE: TIMELINE]

WARN Problem

The product needed ML components that could classify emotional polarity and generate stylistic variations of text to support richer blog content workflows.

EXEC Key Decision

Used KoBERT for polarity recognition and KoBART for style transfer.

OK Outcome

The project gave the team practical AI modules for blog-oriented text processing.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Separated KoBERT and KoBART roles

Used KoBERT for polarity recognition and KoBART for style transfer.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Data Systems
Supporting AI-DLC / MLOps
Primary Focus NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

Classification + Generation · Model scope

Connects Korean NLP classification and generation models to product-shaped web service features.

[SYS_SPEC: METADATA]

Data Surfaces
text
AI-DLC Stages
dataexperimenttraininginferenceevaluation
SYS_SHEET_08 / CASE_STUDY

07 / Conversational AI Project / 2023

FRIMO: Conversational AI for Emotional Support and Diary Generation

A Korean conversational AI project connecting emotion recognition, chatbot, and summarization models to a diary-generation workflow.

Built project Summary available

[SYS_PIPELINE: TIMELINE]

WARN Problem

The product needed ML components that could recognize user emotion and support a diary-generation workflow from daily conversation logs.

EXEC Key Decision

Centered the ML workflow around KoBERT-based emotion recognition and connected it to chatbot and summarization flows.

OK Outcome

The project delivered an MVP-level conversational diary experience with an emotion recognition pipeline.

[SYS_DECISIONS: DESIGN_TRADE_OFFS] · Design Decisions

EXEC
Emotion-first ML pipeline

Centered the ML workflow around KoBERT-based emotion recognition and connected it to chatbot and summarization flows.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Data Systems
Supporting AI-DLC / MLOps
Primary Focus NLP / LLM

[SYS_METRIC: IMPACT] · Review Points

Korean NLP · Pipeline

Connects Korean NLP models into a user-facing AI pipeline for conversational product experience.

[SYS_SPEC: METADATA]

Data Surfaces
text
AI-DLC Stages
dataexperimenttraininginferenceevaluation
SYS_SHEET_09 / RESEARCH_CASE

01 / Journal Paper / 2026

A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making

Mathematics (submitted) / 2026 / Journal Paper

[SYS_RESEARCH: ABSTRACT]

This study proposes a hierarchical clinical decision support framework that estimates diagnostic context via partial-label automated ICD coding and reinjects it into irregular ICU time-series forecasting through context-adaptive gating for mechanical ventilation transition prediction. By conditioning temporal interpretation on diagnostic context, the framework substantially improves rare-event detection.

[SYS_RESEARCH: BACKGROUND] · 연구 배경

This research was designed to overcome key bottlenecks and academic limits in the domain. It addresses EMR structured data and unstructured text/embedding domain complexities, showing modeling judgment that drives real-world value.

[SYS_RESEARCH: METHODOLOGY] · 핵심 기여

Academic Claim: Proposed a context-adaptive gated embedding framework that reinjects diagnostic context from automated ICD coding into ICU time-series prediction.

Methodology: Combines partial-label diagnostic context, TCN temporal representations, and gating to strengthen rare transition-event prediction.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Primary Focus NLP / LLM
Supporting Data Systems

[SYS_METRIC: RELEVANCE]

Shows modeling judgment that connects diagnostic context and time-series signals hierarchically instead of treating data surfaces as isolated inputs.

[SYS_RESEARCH: BIBTEX]

@article{kim2026cage,
  title={A Context-Adaptive Gated Embedding Framework for Advanced Clinical Decision-Making},
  author={Donghyeon Kim and Daeho Kim and Okran Jeong},
  journal={Mathematics},
  year={2026},
  note={{Submitted / under review}}
}

[SYS_SPEC: METADATA]

Clinical Decision Support SystemAutomated ICD CodingICU Time-seriesMechanical Ventilation PredictionPartial-Label LearningExtreme Multi-Class ClassificationTCNGatingRare Event Detection
Email eastlighting1@gachon.ac.kr
GitHub github.com/eastlighting1
LinkedIn www.linkedin.com/in/동현-김-350b4b29b
SYS_SHEET_010 / RESEARCH_CASE

02 / Journal Paper / 2025

Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients

Journal of The Korea Society of Computer and Information / 2025 / Journal Paper

[SYS_RESEARCH: ABSTRACT]

This study proposes an automatic ICD coding model for nursing surveillance of abdominal surgery patients by integrating EMR-based test data, patient information, and nursing notes. A stacking architecture combining dual KM-BERT, XGBoost, and PCA outperformed both a single KM-BERT model and simpler ensemble variants.

[SYS_RESEARCH: BACKGROUND] · 연구 배경

This research was designed to overcome key bottlenecks and academic limits in the domain. It addresses EMR structured data and unstructured text/embedding domain complexities, showing modeling judgment that drives real-world value.

[SYS_RESEARCH: METHODOLOGY] · 핵심 기여

Academic Claim: Proposed a deep-learning ICD coding model that combines structured EMR information and nursing notes for nursing surveillance.

Methodology: Used dual KM-BERT, PCA, and XGBoost stacking to evaluate sparse and domain-sensitive diagnosis labels.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Primary Focus NLP / LLM
Supporting Data Systems

[SYS_METRIC: RELEVANCE]

Supports the portfolio claim that NLP/LLM systems should be judged through domain data structure and error distribution, not only headline accuracy.

[SYS_RESEARCH: BIBTEX]

@article{kim2025deep,
  title={Deep Learning based Automatic ICD Coding for Nursing Surveillance of Abdominal Surgery Patients},
  author={Donghyeon Kim, Daeho Kim, Seyoung Kim, Okran Jeong},
  journal={Journal of The Korea Society of Computer and Information},
  volume={30},
  number={5},
  pages={21--30},
  year={2025},
  publisher={The Korean Society Of Computer And Information}
}

[SYS_SPEC: METADATA]

Medical AINursing SurveillanceEMRAutomatic ICD CodingDeep LearningKM-BERTXGBoostEnsembleAbdominal Surgery
Email eastlighting1@gachon.ac.kr
GitHub github.com/eastlighting1
LinkedIn www.linkedin.com/in/동현-김-350b4b29b
SYS_SHEET_011 / RESEARCH_CASE

03 / Conference Paper / 2025

Empathetic Dialogue Generation Model Using Reinforcement Learning with AI-Based Feedback

Korea Computer Congress (KCC) / 2025 / Conference Paper

[SYS_RESEARCH: ABSTRACT]

This study proposes an empathetic dialogue generation model using reinforcement learning with AI-based feedback (RLAIF) to address limited diversity and reliance on human feedback. By leveraging an LLM as a reward evaluator and integrating it into EmpRL, the model generates more diverse empathetic responses.

[SYS_RESEARCH: BACKGROUND] · 연구 배경

This research was designed to overcome key bottlenecks and academic limits in the domain. It addresses EMR structured data and unstructured text/embedding domain complexities, showing modeling judgment that drives real-world value.

[SYS_RESEARCH: METHODOLOGY] · 핵심 기여

Academic Claim: Proposed an RLAIF-based reinforcement learning structure to reduce reliance on human feedback in empathetic dialogue generation.

Methodology: Used an LLM evaluator as a reward signal and PPO-based policy updates to improve response diversity and empathy alignment.

[SYS_ROLE: TELEMETRY]

NLP Data MLOps
Primary Focus NLP / LLM

[SYS_METRIC: RELEVANCE]

Supports the view that LLM response quality should be treated as an evaluation, reward, and policy-update system rather than a prompt-only outcome.

[SYS_RESEARCH: BIBTEX]

@inproceedings{joo2025empathetic,
  title={Empathetic Dialogue Generation Model Using Reinforcement Learning with AI-Based Feedback},
  author={Yongwan Joo, Donghyun Lim, Donghyeon Kim, Seungyeon Sun, Okran Jeong},
  booktitle={Proceedings of the Korea Computer Congress 2025},
  pages={2410--2412},
  year={2025}
}

[SYS_SPEC: METADATA]

Empathetic DialogueReinforcement LearningRLAIFRLHF AlternativeLLMDialogue GenerationNLPAI Feedback
Email eastlighting1@gachon.ac.kr
GitHub github.com/eastlighting1
LinkedIn www.linkedin.com/in/동현-김-350b4b29b
SYS_SHEET_12 / CREDENTIALS_GROWTH

Print Portfolio

Credentials & Self-Directed Growth

[SYS_CREDENTIAL: PATHWAY] · 자격 및 학습 경로

// 01_ACADEMIC_STUDIES

2024.03 - 2026.02

Gachon University

M.S. in Artificial Intelligence · GPA 4.14/4.5. Intelligent Data Analytics Lab. Advisor - OkRan Jeong. Focused on clinical AI, evaluation, and implementation-oriented research delivery.

FOCUS: Advanced ML, Graph Neural Networks (GNN), Medical Time-Series
2019.03 - 2024.02

Gachon University

B.S. in Software · Gained leadership and community management experience by participating in an official programming club and leading various study groups, including basic ML, advanced ML, financial ML, and GNN.

FOCUS: Data Structures, Algorithms, Linear Algebra, Statistical ML

// 02_PROFESSIONAL_COURSES

2023.11

Practical Implementation of Monitoring and Testing in DevOps Environments

LLOYDK · Completed practical training in Elastic-based DevOps monitoring and testing.

2023.12

Multi Cloud Orchestration Program

5Works · Completed HashiCorp-based multi-cloud orchestration and IaC training.

2024.02

Company-Led Intensive Project Training

DK Techin · Participated in industry-linked practical training focused on security and DevOps engineering.

2024.02

Micro Degree in Software Specialist Training

Gachon University · Completed a micro-degree program for training software specialists.

// 03_CONTINUOUS_LEARNING

2024.03

AWS Builders Online Series

AWS · Learned serverless architectures and generative AI application integration workflows based on Amazon Bedrock

2023.08

Open-source NLP/LLM Continuous Learning

Community · Analyzed open-source LLM parameter-efficient fine-tuning (LoRA, QLoRA) and RAG pattern architectures

[SYS_LOG: PIPELINE_EXECUTION] · 시스템 훈련 로그

[INFO] 2026-05-25 02:09:03 - Initializing training pipeline...
[SPEC] CUDA Version: 12.2 / Device: NVIDIA RTX 4090 x2
[LOAD] Loading pretrained model: kobert-base-v3-medical
· Config: 12-layer, 768-hidden, 12-heads (110M Params)
[TRAIN] Epoch [4/10] - Step [4500/12000] - Loss: 0.1842
· Optimizer: AdamW (lr=3e-5) with linear warmups
[EVAL] Evaluating tokenization on clinical NER testsets...
· F1-Score: 0.8816 / Precision: 0.8842 / Recall: 0.8791
* SYSTEM SIMULATION LOG FOR VISUALIZATION PURPOSE.
* ALL SOURCE ALGORITHMS MAPPED TO LIVE PRODUCTION SERVICES.

[SYS_CREDENTIAL: STUDY_GROUPS] · 자기주도 역량 개발 스터디