Skip to main content

Capture Evidence

Use this page when deciding what belongs inside a run. Observability evidence should be produced by a training, evaluation, or inference action that truly occurred.

What To Capture

EvidenceWhy it matters
dataset or input referenceidentifies the input behind measured results
stages and batcheslocates training and evaluation work
computed metricssupports comparisons and release gates
events and usagerecords calls, choices, and irregularities
artifactsconnects observations to a trained output or evaluation asset

Executable Examples

Each tab is a self-contained program that writes observed results to a local .contexta/ workspace. Copy one tab into the file shown below, install its dependencies, and run it from the directory where you want the workspace to be created.

ExampleSave asInstallWhat the run records
Machine Learningcapture_regression.pyuv add "contexta[sklearn]"dataset event, measured r2 and mae, fitted model artifact
Deep Learningcapture_cnn.pyuv add "contexta[sklearn,torch]"epoch loss, validation sample/accuracy, trained checkpoint, selection event
LLMcapture_llm.pyuv add contextalocal mock API response events, sample metrics, measured pass rate, selection event
"""Train a real regression model and capture its measured evidence."""

import pickle
from pathlib import Path

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split

from contexta import Contexta
from contexta.capture import LocalJsonlSink


features, targets = load_diabetes(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(
features, targets, test_size=0.2, random_state=42
)

workspace = Path(".contexta")
ctx = Contexta(workspace=str(workspace), config={"project_name": "diabetes-regression"})
local_sink = next(sink for sink in ctx.sinks if isinstance(sink, LocalJsonlSink))
model = LinearRegression()

with ctx.run("linear-regression", dataset_ref="dataset:sklearn.diabetes") as run:
run.event(
"dataset.loaded",
message="Loaded the scikit-learn diabetes dataset",
attributes={"rows": len(features), "features": features.shape[1]},
)
with run.stage("train"):
model.fit(train_x, train_y)

with run.stage("evaluate") as stage:
predictions = model.predict(test_x)
r2 = r2_score(test_y, predictions)
mae = mean_absolute_error(test_y, predictions)
stage.metric("r2", r2, unit="ratio")
stage.metric("mae", mae)

model_path = workspace / "models" / "linear-regression.pkl"
model_path.parent.mkdir(parents=True, exist_ok=True)
model_path.write_bytes(pickle.dumps(model))
run.register_artifact("model", str(model_path), attributes={"format": "pickle"})

records_path = local_sink.file_path_for("RECORD").relative_to(Path.cwd())

print(f"Captured run: {run.ref}")
print(f"Measured r2: {r2:.3f}; mae: {mae:.3f}")
print(f"Records: {records_path.as_posix()}")
print(f"Model artifact: {model_path.as_posix()}")

Run the copied program:

uv run capture_regression.py

Use capture_cnn.py or capture_llm.py instead when you copied another tab. All three programs print the captured run and leave record evidence here:

.contexta/
cache/capture/record.jsonl

The ML and Deep Learning programs also print an artifact path under .contexta/models/. The LLM program records request-by-request evaluation evidence only; it does not invent an output file merely to demonstrate an artifact.

When you inspect record.jsonl, look for the measured metric together with its run_ref and stage_execution_ref. That association is the important part of capture: the result is stored with the execution that produced it, instead of appearing as an unexplained number.

Pitfalls

  • Do not present a preselected accuracy or loss value as though a model generated it.
  • Do not write a placeholder checkpoint and describe it as a trained artifact.
  • For provider-shaped integrations, a local mock API is appropriate when the API interaction is real and deterministic while network access and paid model behavior are intentionally excluded.
  • Keep dataset references, computation stages, metrics and resulting artifacts in the same observable run.