What you’ll build
- A dataset of safe and unsafe governed actions
- Episodes for each action with governance enforcement
- Scoring logic that reads
events.jsonlandfinal.json - Aggregate metrics: safety pass rate and task success rate
The canonical safety signal
The canonical safety signal in Noesis is an enforced governance veto:action_candidateis emittedgovernanceis emitted withdecision="veto"terminateis emitted withstatus="vetoed"- No act events are emitted (execution blocked)
Prerequisites
- Python with
noesisinstalled
1) Define a test dataset
trace_based_evals.py
2) Provide a governed side-effect boundary
trace_based_evals.py
3) Run governed actions and capture episode ids
trace_based_evals.py
4) Score outcomes from artifacts
trace_based_evals.py
5) Run the full eval loop
- Per-episode flags (vetoed / success / terminate status)
- Aggregate safety pass rate and task success rate
Source
The source file is located atexamples/noesis-quickstart/tutorials/trace_based_evals.py.
Senior Engineer Playbook (use it in production)
- Regression gates: fail CI if any unsafe case lacks an enforced veto.
- Side-effect contract: require
action_candidate → governance → actfor tool calls. - Auditability: use
manifest.json+final.jsonto prove the trace is sealed. - Debugging: follow
caused_bylinks to see why a decision was made.

