Workingmans/add observability framework guides#940
Workingmans/add observability framework guides#940workingmans-ai wants to merge 8 commits intomainfrom
Conversation
Add three new CSS classes for inline documentation annotations: - .internal-note: Purple styling for internal development notes - .vapi-validation: Orange styling for questions requiring VAPI validation - .claude-note: Green styling for implementation guidance notes Each class includes automatic label injection via ::before pseudo-elements and dark mode variants for readability.
Add new "Guides" section under Observability with 7 pages: - Framework (observability-framework.mdx, renamed from overview) - Instrumentation - Testing strategies - Extraction patterns - Monitoring & Operating - Optimization workflows - Production readiness Removed Integration Limitations page from navigation.
Add top-level framework guide introducing the observability maturity model for voice AI assistants (renamed from overview.mdx). Key sections: - What is observability for voice AI - Five-stage maturity model (INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE) - Stage descriptions with tool mapping - Progressive adoption guidance - Cross-stage workflow examples Includes VAPI validation questions for framing and terminology.
Add two guides covering the "build & validate" stages: **Instrumentation guide:** - Built-in vs custom instrumentation concepts - Purpose and intended outcomes for each type - Tools at a glance (Built-in, Structured Outputs, Call Analysis) - When to use each instrumentation approach **Testing strategies guide:** - Voice AI testing challenges - Tools comparison (Evals vs Simulations vs Test Suites) - Testing pyramid for voice AI - Recommended hybrid testing strategy Both pages use skeleton format with full prose intros and placeholder sections for detailed content, pending VAPI validation.
Add two guides covering data extraction architecture and deployment validation: **Extraction patterns guide:** - Three extraction patterns at a glance (Dashboard Native, Webhook-to-External, Hybrid) - Pattern descriptions with architectural trade-offs - Feature mapping to patterns (Structured Outputs, Scorecards, APIs, Langfuse) - When to use each pattern - Schema design implications - Migration paths between patterns **Production readiness guide:** - Progressive validation approach (INSTRUMENT+TEST → EXTRACT+MONITOR → OPTIMIZE) - Stage-by-stage checklist with required and recommended items - Production readiness gates (first deploy, scaled deploy, mature observability) - Common readiness mistakes and fixes - Deployment workflow timeline Includes VAPI validation questions throughout for pattern accuracy and naming consistency.
Add two guides covering the "monitor & improve" stages (skeleton format): **Monitoring & Operating guide:** - Reframed title from "Monitoring" to "Monitoring & Operating" - Operating voice AI systems introduction (real-time performance, cost, quality) - Tools at a glance (Boards, Insights API, Analytics API, Langfuse, Webhook-to-External) - Placeholder sections for tool details, alerting strategies, best practices - Focus on operational reliability and continuous visibility **Optimization workflows guide:** - Optimization as continuous improvement loop (not a dedicated tool) - 7-step workflow (Detect → Extract → Hypothesize → Change → Test → Deploy → Verify) - Optimization mindset and why it matters - Placeholder sections for detailed steps, common scenarios, best practices - Cross-functional workflow using tools from all previous stages Both pages use skeleton format with complete intros and VAPI validation questions, awaiting tool clarification and detailed content development in iteration 2.
Add internal-note banners indicating completion status for VAPI reviewers: Rough Draft (3 pages - content present, needs refinement): - observability-framework.mdx - instrumentation.mdx - testing-strategies.mdx Skeleton Draft (3 pages - structure only, detailed content pending): - production-readiness.mdx (iteration 3) - monitoring.mdx (iteration 2) - optimization-workflows.mdx (iteration 2) This helps reviewers calibrate expectations for which pages are ready for content review vs. structural/architectural review only.
Changed "stage" to "phase" throughout observability framework to better reflect the non-linear, iterative nature of the model. Phases can be revisited and worked on concurrently, unlike sequential stages. Changes: - Framework page: Updated all section headings from "Stage X:" to "Phase" format, removed numbering from navigation cards, updated prose - All 5 phase guides: Added phase context to subtitle frontmatter (e.g., "This is the INSTRUMENT phase of the observability framework") - Removed numbered stage references throughout Also includes from earlier consistency review: - Framework: Added Test Suites deprecated label, Simulations pre-release - Instrumentation: Removed Call Analysis recommendation, reordered nav cards, added back-link, added inter-stage bridge, removed decorative emoji - Testing strategies: Added prerequisite reference to instrumentation - Extraction patterns: Removed decorative emojis from comparison table
|
🌿 Preview your docs: https://vapi-preview-3ccd306b-7df5-47be-92d1-4e0f093d0c8c.docs.buildwithfern.com |
|
|
||
| Observability for voice AI means **instrumenting your assistants to capture data**, **testing them before production**, **extracting insights from calls**, **monitoring operational health**, and **using that data to continuously improve**. | ||
|
|
||
| Unlike traditional software observability (logs, metrics, traces), voice AI observability must account for: |
There was a problem hiding this comment.
Framing is solid. Few things worth considering adding:
- Latency needs more callout: Voice has a ~500ms awkward silence threshold that text AI doesn't -- and STT + LLM + TTS + telephony latency all compound. A 200ms regression in one layer can tank the whole conversation.
- Interruptions/turn-taking: sometimes an assistant might cut off the user or get steamrolled mid-sentence. These are voice-specific failure modes with no chat equivalent, worth calling out explicitly.
- Cost attribution: enterprises running thousands of concurrent calls need to know which assistant version or use case is driving spend.
- Maybe reframe "Quality is subjective" to "Quality requires new metrics" with interruption rate, silence duration, task completion, etc. Frames it as a solvable problem for the dev reading this.
|
|
||
| --- | ||
|
|
||
| ## Choosing your observability strategy |
There was a problem hiding this comment.
Composer shipped last week and should appear here as the recommended entry point for non-technical users. It can set up instrumentation, debug calls, and suggest fixes via natural language. This changes the "start simple" path significantly; instead of "configure Structured Outputs manually," it could be "tell Composer what you want to track."
| - You need queryable metrics for dashboards → use Structured Outputs | ||
| - You need complex, domain-specific analysis → use Structured Outputs | ||
|
|
||
| <span className="vapi-validation">Confirm Call Analysis status - is this truly legacy/deprecated, or actively supported alongside Structured Outputs? This will help to position accordingly.</span> |
| timeline. | ||
| </Warning> | ||
|
|
||
| <span className="vapi-validation">Legacy Test Suites (voice and chat testing) have been replaced by Evals and Simulations. If you're using Test Suites, migrate to Evals (text-based) or Simulations (voice testing). Confirm deprecation status and migration path.</span> |
|
|
||
| --- | ||
|
|
||
| ## Testing pyramid for voice AI |
There was a problem hiding this comment.
should fit Composer in here. Composer can pull call logs and explain what went wrong, which makes it a debugging companion alongside Evals. Maybe add as a "when tests fail" workflow: run Evals → test fails → use Composer to analyze the call and suggest a fix → update assistant → re-run Eval.
|
|
||
| Your instrumentation choices affect downstream observability: | ||
|
|
||
| **Dashboard Native** (Boards-only monitoring): |
There was a problem hiding this comment.
I like the pattern taxonomy here of Dashboard Native / Hybrid / Webhook-to-External. It gives readers a clear mental model for choosing their approach. How would you suggest introducing new terms to Vapi's vocabulary and getting alignment?
| **Insights API is currently undocumented**. If you need flexible querying or programmatic alerting, contact Vapi support for guidance. | ||
| </Warning> | ||
|
|
||
| <span className="internal-note">Should Insights API be formally documented? What's the relationship between Insights API and Analytics API? Is Insights API the primary alerting mechanism, or are built-in alerts planned?</span> |
|
|
||
| ## Production readiness gates | ||
|
|
||
| Use these **gates** to decide if you're ready to progress: |
evaz
left a comment
There was a problem hiding this comment.
Awesome mental model for devs built out here. The core thing here is that we shipped Composer since you started this work, and it can set up instrumentation, debug calls, and close the optimization loop via natural language. Will need to be woven into INSTRUMENT, TEST, and OPTIMIZE at min. other comments in-line!
Summary
This PR adds a new Observability Guides section to the Vapi documentation — seven new pages that give developers a systematic, framework-level view of how to instrument, test, extract, monitor, and optimize voice AI assistants in production.
These guides complement the existing deep-dive pages (Evals, Scorecards, Boards, Call Analysis) with the strategic context developers need to know which tool to use, when, and why.
What's included
7 new MDX pages under
fern/observability/:observability-framework.mdxinstrumentation.mdxtesting-strategies.mdxextraction-patterns.mdxmonitoring.mdxoptimization-workflows.mdxproduction-readiness.mdxNavigation (
fern/docs.yml): New "Guides" sub-section under Observability linking all seven pages.CSS (
fern/assets/styles.css): Two editorial annotation styles:internal-note(purple) — process notes and TODOs visible in the Fern previewvapi-validation(yellow-green) — questions requiring VAPI input to verify accuracy before publishThe 5-phase framework
The framework introduces a staged maturity model:
Each stage guide explains what the phase accomplishes, which Vapi tools map to it, and links forward to the relevant deep-dive feature docs.
Draft stages explained
Pages are marked at one of two stages:
The yellow-green
[VAPI VALIDATION NEEDED]annotations call out specific factual questions embedded throughout the pages — things like API capability scope, feature status (active vs. legacy), and whether the framework framing aligns with how the VAPI team thinks about observability.Key questions for VAPI review
These are representative of the embedded validation questions — answers unblock content accuracy in the next iteration:
What's intentionally out of scope
Files changed
Testing Steps
fern docs devor navigate to preview deployment