Contexta 스타일 가이드

이 문서는 ML system을 instrument할 때 권장하는 기준을 정리합니다.

Priority A: 필수

실제 작업을 관측하세요

관측하려는 동작에서 얻은 metric을 사용하세요. Workflow 예제는 결과를 기록하기 전에 관측 대상 작업을 수행해야 합니다.

Machine Learning
Deep Learning
LLM

"""Train a real regression model and capture its measured evidence."""

import pickle
from pathlib import Path

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split

from contexta import Contexta
from contexta.capture import LocalJsonlSink


features, targets = load_diabetes(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(
    features, targets, test_size=0.2, random_state=42
)

workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "diabetes-regression"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
model = LinearRegression()

with ctx.run("linear-regression", dataset_ref="dataset:sklearn.diabetes") as run:
    run.event(
        "dataset.loaded",
        message="Loaded the scikit-learn diabetes dataset",
        attributes={"rows": len(features), "features": features.shape[1]},
    )
    with run.stage("train"):
        model.fit(train_x, train_y)

    with run.stage("evaluate") as stage:
        predictions = model.predict(test_x)
        r2 = r2_score(test_y, predictions)
        mae = mean_absolute_error(test_y, predictions)
        stage.metric("r2", r2, unit="ratio")
        stage.metric("mae", mae)

    model_path = workspace / "models" / "linear-regression.pkl"
    model_path.parent.mkdir(parents=True, exist_ok=True)
    model_path.write_bytes(pickle.dumps(model))
    run.register_artifact("model", str(model_path), attributes={"format": "pickle"})

records_path = local_sink.file_path_for("RECORD").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured r2: {r2:.3f}; mae: {mae:.3f}")
print(f"Records: {records_path.as_posix()}")
print(f"Model artifact: {model_path.as_posix()}")

"""Train a tiny CNN and capture epoch, evaluation, and checkpoint evidence."""

from pathlib import Path

import torch
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

from contexta import Contexta
from contexta.capture import LocalJsonlSink


class TinyCNN(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(8 * 4 * 4, 10),
        )

    def forward(self, features: torch.Tensor) -> torch.Tensor:
        return self.layers(features)


torch.manual_seed(7)
digits = load_digits()
train_x, test_x, train_y, test_y = train_test_split(
    digits.images, digits.target, test_size=0.2, stratify=digits.target, random_state=7
)
train_data = TensorDataset(
    torch.tensor(train_x[:, None] / 16.0, dtype=torch.float32),
    torch.tensor(train_y, dtype=torch.long),
)
loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_features = torch.tensor(test_x[:, None] / 16.0, dtype=torch.float32)
test_targets = torch.tensor(test_y, dtype=torch.long)

workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "digits-cnn"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
model = TinyCNN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

with ctx.run("tiny-cnn", dataset_ref="dataset:sklearn.digits") as run:
    with run.stage("train") as stage:
        for epoch in range(1, 3):
            total_loss = 0.0
            for features, targets in loader:
                optimizer.zero_grad()
                loss = loss_fn(model(features), targets)
                loss.backward()
                optimizer.step()
                total_loss += loss.item() * len(targets)
            with stage.batch(f"epoch-{epoch}") as batch:
                batch.metric("loss", total_loss / len(train_data))

    with run.stage("evaluate") as stage:
        with torch.no_grad():
            logits = model(test_features)
            accuracy = (logits.argmax(dim=1) == test_targets).float().mean().item()
        stage.metric("accuracy", accuracy, unit="ratio")
        with stage.sample("first-validation-image") as sample:
            sample.metric(
                "prediction.correct",
                float(logits[0].argmax().item() == test_targets[0].item()),
                unit="ratio",
            )

    checkpoint = workspace / "models" / "tiny-cnn.pt"
    checkpoint.parent.mkdir(parents=True, exist_ok=True)
    torch.save(model.state_dict(), checkpoint)
    run.register_artifact("checkpoint", str(checkpoint), attributes={"epochs": 2})

with ctx.deployment("tiny-cnn-candidate", run_ref=run.ref) as deployment:
    deployment.event("checkpoint.selected", message="Selected trained checkpoint for review")

records_path = local_sink.file_path_for("RECORD").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured validation accuracy: {accuracy:.3f}")
print(f"Records: {records_path.as_posix()}")
print(f"Checkpoint artifact: {checkpoint.as_posix()}")

"""Evaluate an OpenAI-shaped local mock API and capture response evidence."""

from pathlib import Path
from time import perf_counter
from types import SimpleNamespace

from contexta import Contexta
from contexta.capture import LocalJsonlSink


class MockCompletions:
    def create(self, *, model: str, messages: list[dict[str, str]]) -> SimpleNamespace:
        question = messages[-1]["content"]
        if "workspace" in question.lower():
            answer = "Contexta stores local evidence in a .contexta workspace."
        else:
            answer = "I cannot answer from the provided context."
        return SimpleNamespace(
            id=f"chatgpt-mock-{model}",
            choices=[SimpleNamespace(message=SimpleNamespace(content=answer))],
            usage=SimpleNamespace(completion_tokens=len(answer.split())),
        )


class MockOpenAI:
    def __init__(self) -> None:
        self.chat = type("Chat", (), {"completions": MockCompletions()})()


cases = [
    ("workspace-question", "Where is the workspace?", ".contexta"),
    ("unsupported-question", "Which GPU was used?", "cannot answer"),
]
workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "mock-openai-eval"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
client = MockOpenAI()
passed = 0

with ctx.run("mock-chat-evaluation", dataset_ref="dataset:local.prompt-cases") as run:
    with run.stage("evaluate") as stage:
        for name, question, expected in cases:
            started = perf_counter()
            response = client.chat.completions.create(
                model="gpt-4.1-mini-mock",
                messages=[{"role": "user", "content": question}],
            )
            answer = response.choices[0].message.content
            correct = expected in answer
            passed += int(correct)
            with stage.sample(name) as sample:
                sample.metric("correct", float(correct), unit="ratio")
                sample.metric("latency.ms", (perf_counter() - started) * 1000, unit="ms")
                sample.metric("completion.tokens", response.usage.completion_tokens)
                sample.event("response.received", message=answer)
        pass_rate = passed / len(cases)
        stage.metric("pass.rate", pass_rate, unit="ratio")

with ctx.deployment("mock-chat-prompt", run_ref=run.ref) as deployment:
    deployment.event("prompt.selected", message="Selected observed prompt flow for staging")

records_path = local_sink.file_path_for("RECORD").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured prompt-case pass rate: {pass_rate:.2f}")
print(f"Records: {records_path.as_posix()}")

Workflow 예제에 hardcoded 성공 metric이나 placeholder model artifact를 사용하지 마세요. 문법은 설명할 수 있지만 observability를 왜곡합니다.

검토할 수 있을 만큼 Context를 남기세요

중요한 run에는 input 또는 dataset reference, 의미 있는 stage 이름, 측정된 metric, reviewer가 확인하거나 promote할 artifact가 포함되어야 합니다.

예제 Workspace는 분리하세요

복사한 예제는 연습용 작업 디렉터리에서 실행하여 로컬 .contexta/ 워크스페이스가 실제 프로젝트 이력과 섞이지 않게 하세요. 테스트와 maintainer 전용 runner에서는 임시 디렉터리를 사용할 수 있습니다.

Priority B: 강력 권장

먼저 Facade를 사용하세요

Capture, query, comparison, diagnostics, lineage, reports는 Contexta에서 시작하고, storage internal이나 advanced recovery를 설명할 때만 direct store로 내려가세요.

외부 비용은 선택 사항으로 유지하세요

문서가 credential, network behavior, billing 자체를 가르치는 경우가 아니라면 입문 예제에서 이러한 외부 의존성은 제외하세요.

확인 가능한 결과를 출력하세요

예제는 run ref, 측정 score, artifact path, report path, diagnostic summary, workspace location 중 하나를 출력해야 합니다.

Priority C: 권장

실행 가능한 Source를 공유하세요

페이지와 locale마다 조금씩 다른 inline copy를 만들지 말고, 검사 가능한 example file을 문서에 표시하세요.

설명만 번역하세요

한국어 문서는 특정 localized output을 가르치는 경우가 아니라면 영어 문서와 같은 runnable source를 표시해야 합니다.

Priority A: 필수​

실제 작업을 관측하세요​

검토할 수 있을 만큼 Context를 남기세요​

예제 Workspace는 분리하세요​

Priority B: 강력 권장​

먼저 Facade를 사용하세요​

외부 비용은 선택 사항으로 유지하세요​

확인 가능한 결과를 출력하세요​

Priority C: 권장​

실행 가능한 Source를 공유하세요​

설명만 번역하세요​