The summary.json artifact provides a high-level view of episode outcomes, metrics, and flags. It’s designed for quick access to key information without parsing the full event timeline.
Schema overview
{
"schema_version" : "1.2.0" ,
"episode_id" : "ep_2024_abc123_s0" ,
"task" : "Draft release notes for v1.2.0" ,
"started_at" : "2024-01-15T10:30:00Z" ,
"duration_sec" : 5.0 ,
"flags" : { ... },
"metrics" : { ... },
"insight" : { ... },
"tags" : { ... }
}
Root fields
Version of the summary schema. Currently "1.2.0".
Unique episode identifier.
The original task or goal.
ISO 8601 start timestamp.
Total episode duration in seconds.
Flags
Configuration flags for the episode run.
{
"flags" : {
"intuition" : true ,
"mode" : "meta" ,
"direction" : {
"applied" : 2 ,
"vetoed" : 0 ,
"policy" : "SafetyPolicy@1.0" ,
"threshold" : 0.75 ,
"last_diff" : [ "plan.steps[0].parameters" ]
}
}
}
Whether intuition was enabled.
Execution mode: "meta" or "minimal".
Direction statistics (meta mode only). Show Direction properties
Number of directives applied.
Policy that issued directives.
Confidence threshold used.
Paths modified by the last directive.
Metrics
Core execution metrics.
{
"metrics" : {
"success" : 1 ,
"plan_count" : 2 ,
"act_count" : 3 ,
"reflect_count" : 1 ,
"veto_count" : 0 ,
"learn_proposals" : 1 ,
"learn_applied" : 0 ,
"latencies" : {
"first_action_ms" : 150 ,
"total_ms" : 5000 ,
"planning_ms" : 500 ,
"execution_ms" : 4000
}
}
}
Success indicator: 1 for success, 0 for failure.
Number of planning iterations.
Number of actions executed.
Number of reflection passes.
Number of learning proposals generated.
Number of learning proposals applied.
Timing metrics. Milliseconds to first action.
Total episode duration in milliseconds.
Time spent in planning phase.
Time spent in execution phase.
Insight
Advanced insight metrics (meta mode only; may be absent in minimal mode).
{
"insight" : {
"metrics" : {
"plan_adherence" : 0.95 ,
"tool_coverage" : 1.0 ,
"branching_factor" : 2 ,
"plan_revisions" : 1
}
}
}
Computed insight metrics. How closely execution matched the plan (0-1). Rounded to 4 decimal places.
Percentage of planned tools that were used (0-1).
Count of direction events emitted.
Number of times the plan was revised.
User-provided metadata.
{
"tags" : {
"environment" : "staging" ,
"team" : "platform" ,
"priority" : "high"
}
}
Key-value pairs of user-provided metadata.
Complete example
{
"schema_version" : "1.2.0" ,
"episode_id" : "ep_2024_abc123_s0" ,
"task" : "Draft release notes for v1.2.0" ,
"started_at" : "2024-01-15T10:30:00Z" ,
"duration_sec" : 5.0 ,
"flags" : {
"intuition" : true ,
"mode" : "meta" ,
"direction" : {
"applied" : 1 ,
"vetoed" : 0 ,
"policy" : "SafetyPolicy@1.0" ,
"threshold" : 0.75 ,
"last_diff" : [ "plan.steps[0].description" ]
}
},
"metrics" : {
"success" : 1 ,
"plan_count" : 2 ,
"act_count" : 3 ,
"reflect_count" : 1 ,
"veto_count" : 0 ,
"learn_proposals" : 0 ,
"learn_applied" : 0 ,
"latencies" : {
"first_action_ms" : 150 ,
"total_ms" : 5000
}
},
"insight" : {
"metrics" : {
"plan_adherence" : 0.95 ,
"tool_coverage" : 1.0 ,
"branching_factor" : 2 ,
"plan_revisions" : 1
}
},
"tags" : {
"environment" : "staging" ,
"team" : "platform"
}
}
Reading summaries
Python
import noesis as ns
summary = ns.summary.read( "ep_abc123" )
# Basic info
print ( f "Task: { summary[ 'task' ] } " )
print ( f "Success: { summary[ 'metrics' ][ 'success' ] } " )
# Metrics
metrics = summary[ 'metrics' ]
print ( f "Actions: { metrics[ 'act_count' ] } " )
print ( f "Vetoes: { metrics[ 'veto_count' ] } " )
print ( f "First action: { metrics[ 'latencies' ][ 'first_action_ms' ] } ms" )
# Insight (if available)
insight = summary.get( 'insight' , {}).get( 'metrics' , {})
print ( f "Plan adherence: { insight.get( 'plan_adherence' , 'N/A' ) } " )
CLI
# Human-readable
noesis show ep_abc123
# Raw JSON
noesis show ep_abc123 -j
# Extract specific fields
noesis show ep_abc123 -j | jq '.metrics.success'
File access
cat runs/demo/ep_abc123/summary.json | jq .
Use cases
Check success rate
import noesis as ns
episodes = ns.list_runs( limit = 100 )
successes = sum (
ns.summary.read(ep[ 'episode_id' ])[ 'metrics' ][ 'success' ]
for ep in episodes
)
rate = successes / len (episodes)
print ( f "Success rate: { rate :.2%} " )
Find vetoed episodes
import noesis as ns
episodes = ns.list_runs( limit = 100 )
vetoed = [
ep for ep in episodes
if ns.summary.read(ep[ 'episode_id' ])[ 'metrics' ][ 'veto_count' ] > 0
]
print ( f "Vetoed episodes: { len (vetoed) } " )
Analyze latencies
import noesis as ns
import statistics
episodes = ns.list_runs( limit = 100 )
latencies = [
ns.summary.read(ep[ 'episode_id' ])[ 'metrics' ][ 'latencies' ].get( 'first_action_ms' , 0 )
for ep in episodes
]
print ( f "Median first action: { statistics.median(latencies) :.0f} ms" )
print ( f "P95 first action: { statistics.quantiles(latencies, n = 20 )[ - 1 ] :.0f} ms" )
Next steps