runs/demo/..., and you can literally see your agent think. This guide is for the moment after that first “wow”:
“I like this. How do I use it responsibly in a real service, team, or eval harness?”It’s practical and opinionated: what to do, what to avoid, and how to fit Noēsis into your stack without a big migration.
1) What Noēsis actually adds
- Episodes: every run is a cognitive episode with a stable ID.
- Artifacts:
events.jsonl,summary.json,state.json,manifest.json, optionalprompts.jsonl. - Cognitive phases:
observe → interpret → plan → direction → governance → act → reflect → learn → insight. - Governance & direction: policies that hint, intervene, or veto, with explicit flags in summary + events.
- Determinism controls: seeds, deterministic clocks/IDs, replay support.
- Replace your log platform/APM/metrics.
- Create API keys, databases, or vector stores for you.
- Magically make unsafe tools safe—you still design the policies.
2) Start local, then widen the radius
- Run Hello Episode.
- Turn on a single guarded agent (e.g., Guarded LangGraph Agent).
- Inspect a handful of
events.jsonl+summary.json.
- Local only: keep default
./runs/demo/.... - Shared dev: network volume/S3/mounted path in Docker/Kubernetes.
- Service-level: scoped location per app/team.
3) Where to store episodes
Goal: make episodes easy to find, safe to keep, and cheap to delete.
- Keep
./runs/.... - Add cleanup: in CI delete after each job; on dev machines prune old runs (
find runs -mtime +7 -deleteor similar).
- Centralize base path:
/var/noesis/runs/<service>/<date>/...or a mounted volume/S3 prefix. - Add retention: keep N latest per service, and “golden” episodes (evals, interesting failures) tagged for longer.
- Noēsis never deletes by itself—use your scheduler/scripts.
4) Tools, side-effects, and partial visibility
Noēsis records cognitive steps: plan events, tool calls underact, governance decisions under direction/governance, reflect outcomes. It does not intercept every network packet or syscall.
- Wrap side-effects as tools so they show up as
actevents with inputs/outputs. - Log enough context to explain behavior without storing sensitive data verbatim.
Partial visibility is fine: e.g., log hashed record IDs instead of full PII. Noēsis still provides useful traces.It’s normal for early integrations to have a mix of well-instrumented tools and “black boxes” — you can tighten coverage over time.
5) Memory backends: don’t overthink day one
Memory is a port—you can back it with SQLite, Postgres, a vector store, or nothing.- Phase 1: No long-term memory. Treat episodes as mostly stateless; use tags + artifacts.
- Phase 2: Simple DB-backed memory. Implement a memory port writing to your existing DB; keep schemas boring and auditable. See: Add a memory port.
- Phase 3: Vector/hybrid memory. Only if you need semantic recall (multi-session chat, etc.); maintain an index of what/why you store.
6) Determinism and replay in practice
Determinism is a dial, not a cliff:- Minimum: set a seed for basic reproducibility.
- Evals/regressions: add deterministic clock/IDs and replay comparisons.
7) Integrating with your observability stack
Artifacts are JSON—great for dashboards, alerts, and evals.- Periodic job: scan new
summary.json, push metrics/flags into Prometheus/Datadog/etc. - Log pipeline: ship
events.jsonlto Loki/Elastic; index by episode_id, task, tags, phase. - Don’t replace existing observability—extend it with cognitive traces.
8) Rollout plan (safe by default)
1
Instrument one path
Pick one agent/workflow and run it through Noēsis in non-prod.
2
Read 10 episodes by hand
Open artifacts, walk timelines, verify traces match expectations.
3
Add one guardrail
Add a simple policy (e.g., block dangerous deletes) and verify vetoes in
events.jsonl and summary.json.4
Define storage + retention
Decide where episodes live, who can access them, and how long you keep them.
5
Wire a small eval
Use Trace-Based Evals with 20–50 tasks.
6
Scale gradually
Only after this feels boring should you enable Noēsis for more flows or teams.
9) Adoption FAQ
Do I need a database or vector store on day one?No. You can run Noēsis with nothing but artifacts on disk. Add memory ports later if you actually need long-term recall (chat history, cross-session behavior, etc.). Can I use it “just” with LangGraph?
Yes. Wrap your graph as the planner/actuator and you’ll see
plan / act / governance events plus direction flags in summary.json. You don’t have to change your graph structure to get traces.
What about CrewAI?Yes. Treat the CrewAI crew as the planner/actuator and wrap the tools/skills so they show up as
act events with inputs/outputs. Governance/direction events will surface the same way.
What if my tools are messy or opaque?Start by wrapping only the high-risk tools (deletes, writes, external APIs). It’s fine to have a mix of well-instrumented tools and “black boxes”; you can tighten coverage over time as you gain confidence. Will this replace my logs/APM?
No. Noēsis augments your existing observability with cognitive traces (episodes, phases, vetoes). Keep your current logs, metrics, and APM; if you want dashboards/alerts, feed Noēsis artifacts into that stack. Where should episodes live?
Pick one location per environment (local path, mounted volume, or bucket) and point Noēsis there. Start simple; add retention/cleanup policies once more services or teams are writing episodes. How much overhead is this in practice?
For most teams, the first step is: set a runs directory, wrap one agent/workflow, and read a few episodes. You don’t need a new database, control plane, or infra team—just a directory and a bit of configuration.
10) Quick checklist
- Where do episodes live in this environment?
- How long do we keep them, and how do we delete them?
- Which agents are guarded by policies, and what do those policies block or veto?
- Do we have at least one small eval that uses traces (not just final answers)?
- Do we know how to replay a “weird” episode and read its timeline?
11) Where to go next
- Just getting started? → Hello Episode
- Want to see vetoes? → Guarded LangGraph Agent
- Ready for evals? → Trace-Based Evals

