아티팩트 관리하기

아티팩트는 Contexta 기록에 연결되어 오래 보존되는 파일입니다.

학습된 모델부터 체크포인트, 평가셋, 프롬프트 템플릿 등 실행 과정에서 실제로 생성했거나 사용한 파일을 아티팩트로 등록할 수 있습니다.

이 문서는 아티팩트 등록 후 해당 파일의 여러 정보와 연결에 대한 부분들을 함께 추적하는 과정을 설명합니다.

실행 가능한 예제

아래의 예제를 통해 Contexta가 어떻게 워크플로가 실제 산출 파일을 만들고 그 파일을 현재 실행의 아티팩트로 등록하는지 알아봅시다.

Machine Learning
Deep Learning

"""Persist a fitted regression model and register it as observed evidence."""

import pickle
from pathlib import Path

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from contexta import Contexta
from contexta.capture import LocalJsonlSink


features, targets = load_diabetes(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(
    features, targets, test_size=0.2, random_state=42
)

workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "diabetes-artifact"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
model = LinearRegression()

with ctx.run("fitted-model", dataset_ref="dataset:sklearn.diabetes") as run:
    with run.stage("train"):
        model.fit(train_x, train_y)

    with run.stage("evaluate") as stage:
        r2 = r2_score(test_y, model.predict(test_x))
        stage.metric("r2", r2, unit="ratio")

    model_path = workspace / "models" / "linear-regression.pkl"
    model_path.parent.mkdir(parents=True, exist_ok=True)
    model_path.write_bytes(pickle.dumps(model))
    registration = run.register_artifact(
        "model",
        str(model_path),
        attributes={"framework": "scikit-learn", "format": "pickle"},
    )

artifact_ref = registration.payload["manifest"].artifact_ref
artifacts_path = local_sink.file_path_for("ARTIFACT").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured r2: {r2:.3f}")
print(f"Registered artifact: {artifact_ref}")
print(f"Model file: {model_path.as_posix()}")
print(f"Artifact records: {artifacts_path.as_posix()}")

"""Train a tiny CNN, save its checkpoint, and register the checkpoint artifact."""

from pathlib import Path

import torch
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

from contexta import Contexta
from contexta.capture import LocalJsonlSink


class TinyCNN(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(8 * 4 * 4, 10),
        )

    def forward(self, features: torch.Tensor) -> torch.Tensor:
        return self.layers(features)


torch.manual_seed(7)
digits = load_digits()
train_x, test_x, train_y, test_y = train_test_split(
    digits.images, digits.target, test_size=0.2, stratify=digits.target, random_state=7
)
train_data = TensorDataset(
    torch.tensor(train_x[:, None] / 16.0, dtype=torch.float32),
    torch.tensor(train_y, dtype=torch.long),
)
loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_features = torch.tensor(test_x[:, None] / 16.0, dtype=torch.float32)
test_targets = torch.tensor(test_y, dtype=torch.long)

workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "digits-artifact"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
model = TinyCNN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

with ctx.run("trained-checkpoint", dataset_ref="dataset:sklearn.digits") as run:
    with run.stage("train"):
        for features, targets in loader:
            optimizer.zero_grad()
            loss = loss_fn(model(features), targets)
            loss.backward()
            optimizer.step()

    with run.stage("evaluate") as stage:
        with torch.no_grad():
            predictions = model(test_features).argmax(dim=1)
        accuracy = (predictions == test_targets).float().mean().item()
        stage.metric("accuracy", accuracy, unit="ratio")

    checkpoint_path = workspace / "checkpoints" / "tiny-cnn.pt"
    checkpoint_path.parent.mkdir(parents=True, exist_ok=True)
    torch.save(model.state_dict(), checkpoint_path)
    registration = run.register_artifact(
        "checkpoint",
        str(checkpoint_path),
        attributes={"framework": "pytorch", "epochs": 1},
    )

artifact_ref = registration.payload["manifest"].artifact_ref
artifacts_path = local_sink.file_path_for("ARTIFACT").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured validation accuracy: {accuracy:.3f}")
print(f"Registered artifact: {artifact_ref}")
print(f"Checkpoint file: {checkpoint_path.as_posix()}")
print(f"Artifact records: {artifacts_path.as_posix()}")

코드를 handling_artifacts.py로 저장한 뒤, Contexta가 설치된 환경에서 실행하세요.

uv run handling_artifacts.py

두 예제 모두 모델 학습이 끝난 뒤 프레임워크에서 얻은 실제 파라미터를 파일에 기록하고, 그 경로를 run.register_artifact(...)에 전달합니다.

실행 결과 확인하기

머신러닝 예제에서는 터미널에 다음과 같은 출력이 표시됩니다. model-... 뒤의 고유 접미사는 실행할 때마다 달라질 수 있습니다.

Captured run: run:diabetes-artifact.fitted-model
Measured r2: 0.453
Registered artifact: artifact:diabetes-artifact.fitted-model.model-d10c0378adbc
Model file: .contexta/models/linear-regression.pkl
Artifact records: .contexta/cache/capture/artifact.jsonl

이 예제는 diabetes 데이터셋을 학습용과 평가용으로 나누고, LinearRegression 모델에 실제로 학습합니다.

평가 단계에서 계산한 r2는 예측 성능을 나타내는 메트릭 기록이고, linear-regression.pkl은 그 결과를 만든 학습된 모델 파일입니다.

Contexta는 점수와 파일을 같은 실행에 연결합니다. 따라서 나중에 점수만 남아 있고 해당 모델 파일의 출처를 알 수 없는 상태를 피할 수 있습니다.

출력	확인할 수 있는 내용
`Captured run`	모델 학습, 평가, 파일 등록이 하나의 `fitted-model` 실행에 속합니다.
`Measured r2`	등록된 모델이 같은 실행의 평가 단계에서 얻은 측정 결과입니다.
`Registered artifact`	Contexta가 이 모델 파일에 부여한 식별자입니다.
`Model file`	모델을 실제로 직렬화하여 저장한 파일입니다.
`Artifact records`	이 파일이 어느 실행의 어떤 아티팩트인지 설명하는 등록 기록이 저장된 위치입니다.

실행 후 워크스페이스에는 다음과 같은 결과가 생성됩니다.

.contexta/
  cache/capture/
    record.jsonl
    artifact.jsonl
  models/
    linear-regression.pkl

models/linear-regression.pkl은 학습된 모델의 실제 파일이며, artifact.jsonl은 그 파일을 해석하고 추적하기 위한 기록입니다.

모델 파일만 있으면 파일을 다시 불러와 사용할 수는 있지만, 어느 실행에서 생성됐고 어떤 평가 결과와 연결되는지는 별도로 추적해야 합니다.

이때 아티팩트 기록이 함께 있으면 이 관계를 실행 기준으로 확인할 수 있습니다.

`artifact.jsonl`에서 확인하는 등록 기록

머신러닝 예제가 남긴 아티팩트 항목의 핵심 구조는 다음과 같습니다.

단, 시간 · 파일의 절대 경로 · 해시와 식별자 접미사 등은 실행 환경과 저장된 파일 내용에 따라 달라집니다.

{
  "family": "ARTIFACT",
  "payload": {
    "binding_status": "BOUND",
    "manifest": {
      "artifact_kind": "model",
      "artifact_ref": "artifact:diabetes-artifact.fitted-model.model-...",
      "run_ref": "run:diabetes-artifact.fitted-model",
      "location_ref": "path:.../.contexta/models/linear-regression.pkl",
      "hash_value": "...",
      "size_bytes": 576,
      "attributes": {
        "framework": "scikit-learn",
        "format": "pickle"
      }
    },
    "source": {
      "exists": true,
      "kind": "PATH",
      "uri": ".../.contexta/models/linear-regression.pkl"
    }
  },
  "sink_name": "local-jsonl"
}

필드	이 예제에서의 의미
`family: "ARTIFACT"`	이 JSON 행이 아티팩트 등록 기록임을 나타냅니다.
`artifact_kind: "model"`	이 파일을 학습된 모델입니다.
`artifact_ref`	이 결과물을 다시 가리킬 수 있는 고유 참조입니다.
`run_ref`	이 파일은 `r2`를 측정한 바로 그 실행에서 만들어졌습니다.
`location_ref`	등록 당시 모델 파일이 존재하던 실제 경로입니다.
`hash_value`	파일 내용으로 계산한 해시입니다. 나중에 파일이 동일한지 확인할 때 사용할 수 있습니다.
`size_bytes`	등록 당시 파일 크기입니다.
`binding_status: "BOUND"`	파일이 실제로 존재하는 상태에서 등록 기록과 연결되었습니다.
`source.exists: true`	Contexta가 등록 시점에 원본 파일의 존재를 확인했다는 뜻입니다.

실행 가능한 예제​

실행 결과 확인하기​

artifact.jsonl에서 확인하는 등록 기록​

관련 문서​

실행 가능한 예제

실행 결과 확인하기

`artifact.jsonl`에서 확인하는 등록 기록

관련 문서