Getting Started With Contexta
This guide gets you from installation to a captured metric from work that actually ran.
What You'll Learn
- how to install Contexta for a first capture
- how to create a local
.contexta/workspace - how an executed workflow produces an observable metric
- what to inspect after the run
Prerequisites
- Python
>=3.14 - a writable directory for
.contexta/
No cloud account, external database, or collector process is required.
Install
uv add "contexta[sklearn]"
Run Your First Capture
This example runs a small calculation and records the resulting metric.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from contexta import Contexta
features, targets = load_diabetes(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(
features, targets, test_size=0.2, random_state=42
)
model = LinearRegression()
ctx = Contexta(workspace=".contexta", config={"project_name": "getting-started"})
with ctx.run("training") as run:
with run.stage("train"):
model.fit(train_x, train_y)
with run.stage("evaluate") as stage:
score = r2_score(test_y, model.predict(test_x))
stage.metric("r2", score, unit="ratio")
print(f"Captured run: {run.ref}")
print(f"Measured r2: {score:.3f}")
Save the code as contexta_example.py, then run it in the environment where
you installed Contexta:
uv run contexta_example.py
What Happened
The terminal output tells you that one run was observed and that its evaluation
produced an r2 value:
Captured run: run:getting-started.training
Measured r2: 0.453
Contexta also writes the evidence to:
.contexta/
cache/capture/record.jsonl
record.jsonl is a JSON Lines file: each line is one captured observation.
The line created by this example contains information like this:
{
"captured_at": "2026-05-25T10:26:03.502991Z",
"payload": {
"envelope": {
"record_type": "metric",
"run_ref": "run:getting-started.training",
"stage_execution_ref": "stage:getting-started.training.evaluate",
"completeness_marker": "complete",
"degradation_marker": "none"
},
"payload": {
"metric_key": "r2",
"value": 0.4526027629719198,
"unit": "ratio"
}
},
"payload_type": "MetricRecord",
"sink_name": "local-jsonl"
}
You can read that record as:
| Field | Plain-language meaning |
|---|---|
payload_type: "MetricRecord" | This observation is a measured number, not a log message or file. |
metric_key: "r2" | The number measures the evaluation score calculated by the program. |
value: 0.4526... | This is the actual result measured in this run. |
run_ref | The result belongs to the training run in the getting-started project. |
stage_execution_ref | The result was produced during the evaluate step, not during training. |
captured_at | When Contexta stored this observation. |
completeness_marker / degradation_marker | Contexta captured this record completely and did not mark it as degraded. |
sink_name: "local-jsonl" | The record was stored locally in this JSON Lines file. |
In other words, Contexta did not only save 0.453. It saved what that number
means, which run and step produced it, when it was captured, and whether the
capture was complete.
Core Concepts
| Plane | What it stores |
|---|---|
| Metadata | Projects, runs, stages, environments, deployments, batches, samples |
| Records | Metrics, structured events, spans, degraded markers |
| Artifacts | Models, checkpoints, prompt/evaluation files, reports |
Where To Go Next
- Capture Evidence explains what the real workflows preserve.
- Compare Runs trains and compares actual candidates.
- Common Workflows covers reports, diagnostics, and lineage.
- API Reference documents the Python boundaries.