Quick setup: seed-based determinism
The easiest way to get repeatable behavior is to fix a seed on your session:summary.json.
Stricter reproducibility (clock + RNG)
If you need fully stable timings/IDs (e.g., evals in CI), add a deterministic clock and RNG:summary.json.
Replay and comparison
Compare two episodes to check for drift. Ignore timing fields to focus on behavior:Golden tests (pytest)
Use deterministic sessions in tests:Episode ID format
Episode IDs are human-readable and sortable:- Prefix
ep_ - Date + time for sortability
- Content hash + entropy for uniqueness
- Seed suffix (
s0if unset) for reproducibility
Deterministic components (overview)
| Component | Purpose | How it works |
|---|---|---|
| Deterministic clock | Consistent timestamps | Fixed tick intervals instead of wall clock |
| Deterministic RNG | Reproducible random values | Seeded random number generator |
| Deterministic IDs | Stable identifiers | UUIDv5 based on namespace + content |
| Canonical JSON | Byte-identical output | Sorted keys, consistent formatting |
Advanced: canonical JSON
Artifacts use canonical JSON so the same data yields the same bytes:When to use determinism
CI validation
Add deterministic tests and replay checks to CI:tests/golden/veto_enforce/run_{a,b}) to ensure governance veto semantics remain deterministic (no Act events, terminate status vetoed, direction_blocked + governance lineage intact).
