Documentation Index
Fetch the complete documentation index at: https://docs.noesis.systems/llms.txt
Use this file to discover all available pages before exploring further.
Determinism is optional. Use it when you need stable traces for evals, regression tests, or debugging a specific episode. You can ignore it for casual experimentation.
Quick setup: seed-based determinism
The easiest way to get repeatable behavior is to fix a seed on your session:
import noesis as ns
from noesis.runtime.session import SessionBuilder
session = (
SessionBuilder.from_env()
.with_determinism(seed=42)
.build()
)
ep = session.run("Summarize incident INC-1234")
With a deterministic model/tooling stack, runs with the same seed will produce the same episode trajectory and metrics. The seed is recorded in summary.json.
Stricter reproducibility (clock + RNG)
If you need fully stable timings/IDs (e.g., evals in CI), add a deterministic clock and RNG:
import noesis as ns
from noesis.runtime.session import SessionBuilder
from noesis.runtime.determinism import DeterministicClock, DeterministicRNG
clock = DeterministicClock.from_start("2024-01-15T10:30:00Z", tick_ms=10)
rng = DeterministicRNG(seed=42)
session = (
SessionBuilder.from_env()
.with_determinism(clock=clock, rng=rng, episode_timestamp_ms=1705314600000)
.build()
)
ep = session.run("Draft release notes")
This locks timestamps, random numbers, and episode timestamps; the configuration is reflected in summary.json.
Replay and comparison
Compare two episodes to check for drift. Ignore timing fields to focus on behavior:
import noesis as ns
def compare_episodes(ep_a: str, ep_b: str) -> dict:
events_a = list(ns.events.read(ep_a))
events_b = list(ns.events.read(ep_b))
diffs = {
"event_count_match": len(events_a) == len(events_b),
"phase_sequence_match": True,
"payload_diffs": [],
}
for e1, e2 in zip(events_a, events_b):
if e1["phase"] != e2["phase"]:
diffs["phase_sequence_match"] = False
break
# Ignore timing differences; compare payloads to catch behavioral drift
p1 = dict(e1.get("payload", {}))
p2 = dict(e2.get("payload", {}))
if p1 != p2:
diffs["payload_diffs"].append(
{"phase": e1["phase"], "diff": {"expected": p1, "actual": p2}}
)
return diffs
Golden tests (pytest)
Use deterministic sessions in tests:
import json
from pathlib import Path
import noesis as ns
from noesis.runtime.session import SessionBuilder
def test_golden_episode():
session = (
SessionBuilder.from_env()
.with_determinism(seed=42)
.build()
)
episode_id = session.run("Generate test data")
summary = ns.summary.read(episode_id)
events = list(ns.events.read(episode_id))
golden = json.loads(Path("tests/golden/generate_test_data.json").read_text())
assert summary["metrics"]["success"] == golden["metrics"]["success"]
assert summary["metrics"]["act_count"] == golden["metrics"]["act_count"]
assert [e["phase"] for e in events] == [e["phase"] for e in golden["events"]]
Episode ID format
Episode IDs are human-readable and sortable:
ep_<YYYYMMDD>_<HHMMSS>_<hash>_<entropy>_s<seed>
- Prefix
ep_
- Date + time for sortability
- Content hash + entropy for uniqueness
- Seed suffix (
s0 if unset) for reproducibility
Deterministic components (overview)
| Component | Purpose | How it works |
|---|
| Deterministic clock | Consistent timestamps | Fixed tick intervals instead of wall clock |
| Deterministic RNG | Reproducible random values | Seeded random number generator |
| Deterministic IDs | Stable identifiers | UUIDv5 based on namespace + content |
| Canonical JSON | Byte-identical output | Sorted keys, consistent formatting |
Direction/governance events use deterministic UUIDv5 IDs so the same inputs produce the same identifiers, making replay comparisons reliable.
Advanced: canonical JSON
Artifacts use canonical JSON so the same data yields the same bytes:
from noesis.runtime.determinism import canonical_dumps
payload = {"b": 2, "a": 1, "c": [3, 1, 2]}
assert canonical_dumps(payload) == canonical_dumps(payload)
This keeps manifest hashes and diffs stable.
When to use determinism
Use deterministic mode for:
- Evals and golden tests
- Debugging specific runs
- Compliance scenarios that require reproducibility
Avoid deterministic mode for:
- Production workloads that need real timestamps/IDs
- Performance benchmarks
- Security-sensitive operations where predictable IDs are a risk
CI validation
Add deterministic tests and replay checks to CI:
# .github/workflows/determinism.yml
name: Determinism Validation
on: [push, pull_request]
jobs:
golden-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v4
- name: Run golden tests
run: uv run pytest tests/golden/ -v
- name: Check replay stability
run: uv run python scripts/validate_replay.py
The replay gate includes multiple goldens; one is an enforce-veto run (tests/golden/veto_enforce/run_{a,b}) to ensure governance veto semantics remain deterministic (no Act events, terminate status vetoed, direction_blocked + governance lineage intact).