Workingmans/add observability framework guides by workingmans-ai · Pull Request #940 · VapiAI/docs

workingmans-ai · 2026-02-17T17:26:42Z

Summary

This PR adds a new Observability Guides section to the Vapi documentation — seven new pages that give developers a systematic, framework-level view of how to instrument, test, extract, monitor, and optimize voice AI assistants in production.

These guides complement the existing deep-dive pages (Evals, Scorecards, Boards, Call Analysis) with the strategic context developers need to know which tool to use, when, and why.

What's included

7 new MDX pages under fern/observability/:

Page	Status	Description
`observability-framework.mdx`	Rough Draft	Framework overview with the 5-phase maturity model
`instrumentation.mdx`	Rough Draft	INSTRUMENT phase — built-in hooks, Structured Outputs, Call Analysis
`testing-strategies.mdx`	Rough Draft	TEST phase — testing pyramid, Evals vs Simulations decision guide
`extraction-patterns.mdx`	Rough Draft	EXTRACT phase — three extraction architecture patterns and tradeoffs
`monitoring.mdx`	Skeleton Draft	MONITOR phase — Boards, Analytics API, Insights API, Langfuse, alerting
`optimization-workflows.mdx`	Skeleton Draft	OPTIMIZE phase — optimization loop, common scenarios
`production-readiness.mdx`	Skeleton Draft	Cross-phase launch checklist with readiness gates

Navigation (fern/docs.yml): New "Guides" sub-section under Observability linking all seven pages.

CSS (fern/assets/styles.css): Two editorial annotation styles:

internal-note (purple) — process notes and TODOs visible in the Fern preview
vapi-validation (yellow-green) — questions requiring VAPI input to verify accuracy before publish

The 5-phase framework

The framework introduces a staged maturity model:

INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE

Each stage guide explains what the phase accomplishes, which Vapi tools map to it, and links forward to the relevant deep-dive feature docs.

Draft stages explained

Pages are marked at one of two stages:

Rough Draft — Full prose written; structure and claims need VAPI validation
Skeleton Draft — Structure and scope established; detailed content follows VAPI feedback in iteration 2

The yellow-green [VAPI VALIDATION NEEDED] annotations call out specific factual questions embedded throughout the pages — things like API capability scope, feature status (active vs. legacy), and whether the framework framing aligns with how the VAPI team thinks about observability.

Key questions for VAPI review

These are representative of the embedded validation questions — answers unblock content accuracy in the next iteration:

Call Analysis status — Is Call Analysis actively supported alongside Structured Outputs, or effectively legacy? This affects positioning in the Instrumentation guide.
Analytics API vs Insights API — What are the distinct use cases for each? When should a developer choose one over the other? (Monitoring guide)
Framework framing — Does the INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE model match how Vapi thinks about observability? Are there phases missing or named differently internally?
Extraction patterns — Are the three extraction patterns (Structured Outputs, Call Analysis, external integrations) accurate and complete?
Production readiness gates — Does Vapi have internal launch criteria or customer-facing readiness standards we should align the checklist to?

What's intentionally out of scope

Deeper feature guides (Evals, Scorecards, Boards, Simulations) — existing or planned in separate work
API reference content — these are conceptual/workflow guides only
Finalized content on Skeleton Draft pages — that iteration follows VAPI feedback on this draft

Files changed

fern/observability/observability-framework.mdx   (new, 224 lines)
fern/observability/instrumentation.mdx           (new, 303 lines)
fern/observability/testing-strategies.mdx        (new, 181 lines)
fern/observability/extraction-patterns.mdx       (new, 366 lines)
fern/observability/monitoring.mdx                (new, 151 lines)
fern/observability/optimization-workflows.mdx    (new, 177 lines)
fern/observability/production-readiness.mdx      (new, 408 lines)
fern/docs.yml                                    (navigation additions)
fern/assets/styles.css                           (editorial annotation CSS)

Testing Steps

Run the app locally using fern docs dev or navigate to preview deployment
Ensure that the changed pages and code snippets work

Add three new CSS classes for inline documentation annotations: - .internal-note: Purple styling for internal development notes - .vapi-validation: Orange styling for questions requiring VAPI validation - .claude-note: Green styling for implementation guidance notes Each class includes automatic label injection via ::before pseudo-elements and dark mode variants for readability.

Add new "Guides" section under Observability with 7 pages: - Framework (observability-framework.mdx, renamed from overview) - Instrumentation - Testing strategies - Extraction patterns - Monitoring & Operating - Optimization workflows - Production readiness Removed Integration Limitations page from navigation.

Add top-level framework guide introducing the observability maturity model for voice AI assistants (renamed from overview.mdx). Key sections: - What is observability for voice AI - Five-stage maturity model (INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE) - Stage descriptions with tool mapping - Progressive adoption guidance - Cross-stage workflow examples Includes VAPI validation questions for framing and terminology.

Add two guides covering the "build & validate" stages: **Instrumentation guide:** - Built-in vs custom instrumentation concepts - Purpose and intended outcomes for each type - Tools at a glance (Built-in, Structured Outputs, Call Analysis) - When to use each instrumentation approach **Testing strategies guide:** - Voice AI testing challenges - Tools comparison (Evals vs Simulations vs Test Suites) - Testing pyramid for voice AI - Recommended hybrid testing strategy Both pages use skeleton format with full prose intros and placeholder sections for detailed content, pending VAPI validation.

Add two guides covering data extraction architecture and deployment validation: **Extraction patterns guide:** - Three extraction patterns at a glance (Dashboard Native, Webhook-to-External, Hybrid) - Pattern descriptions with architectural trade-offs - Feature mapping to patterns (Structured Outputs, Scorecards, APIs, Langfuse) - When to use each pattern - Schema design implications - Migration paths between patterns **Production readiness guide:** - Progressive validation approach (INSTRUMENT+TEST → EXTRACT+MONITOR → OPTIMIZE) - Stage-by-stage checklist with required and recommended items - Production readiness gates (first deploy, scaled deploy, mature observability) - Common readiness mistakes and fixes - Deployment workflow timeline Includes VAPI validation questions throughout for pattern accuracy and naming consistency.

Add two guides covering the "monitor & improve" stages (skeleton format): **Monitoring & Operating guide:** - Reframed title from "Monitoring" to "Monitoring & Operating" - Operating voice AI systems introduction (real-time performance, cost, quality) - Tools at a glance (Boards, Insights API, Analytics API, Langfuse, Webhook-to-External) - Placeholder sections for tool details, alerting strategies, best practices - Focus on operational reliability and continuous visibility **Optimization workflows guide:** - Optimization as continuous improvement loop (not a dedicated tool) - 7-step workflow (Detect → Extract → Hypothesize → Change → Test → Deploy → Verify) - Optimization mindset and why it matters - Placeholder sections for detailed steps, common scenarios, best practices - Cross-functional workflow using tools from all previous stages Both pages use skeleton format with complete intros and VAPI validation questions, awaiting tool clarification and detailed content development in iteration 2.

Add internal-note banners indicating completion status for VAPI reviewers: Rough Draft (3 pages - content present, needs refinement): - observability-framework.mdx - instrumentation.mdx - testing-strategies.mdx Skeleton Draft (3 pages - structure only, detailed content pending): - production-readiness.mdx (iteration 3) - monitoring.mdx (iteration 2) - optimization-workflows.mdx (iteration 2) This helps reviewers calibrate expectations for which pages are ready for content review vs. structural/architectural review only.

Changed "stage" to "phase" throughout observability framework to better reflect the non-linear, iterative nature of the model. Phases can be revisited and worked on concurrently, unlike sequential stages. Changes: - Framework page: Updated all section headings from "Stage X:" to "Phase" format, removed numbering from navigation cards, updated prose - All 5 phase guides: Added phase context to subtitle frontmatter (e.g., "This is the INSTRUMENT phase of the observability framework") - Removed numbered stage references throughout Also includes from earlier consistency review: - Framework: Added Test Suites deprecated label, Simulations pre-release - Instrumentation: Removed Call Analysis recommendation, reordered nav cards, added back-link, added inter-stage bridge, removed decorative emoji - Testing strategies: Added prerequisite reference to instrumentation - Extraction patterns: Removed decorative emojis from comparison table

github-actions · 2026-02-17T17:31:58Z

🌿 Preview your docs: https://vapi-preview-3ccd306b-7df5-47be-92d1-4e0f093d0c8c.docs.buildwithfern.com

evaz · 2026-02-26T06:26:28Z

fern/observability/observability-framework.mdx

+
+Observability for voice AI means **instrumenting your assistants to capture data**, **testing them before production**, **extracting insights from calls**, **monitoring operational health**, and **using that data to continuously improve**.
+
+Unlike traditional software observability (logs, metrics, traces), voice AI observability must account for:


Framing is solid. Few things worth considering adding:

Latency needs more callout: Voice has a ~500ms awkward silence threshold that text AI doesn't -- and STT + LLM + TTS + telephony latency all compound. A 200ms regression in one layer can tank the whole conversation.

Interruptions/turn-taking: sometimes an assistant might cut off the user or get steamrolled mid-sentence. These are voice-specific failure modes with no chat equivalent, worth calling out explicitly.

Cost attribution: enterprises running thousands of concurrent calls need to know which assistant version or use case is driving spend.

Maybe reframe "Quality is subjective" to "Quality requires new metrics" with interruption rate, silence duration, task completion, etc. Frames it as a solvable problem for the dev reading this.

evaz · 2026-02-26T06:28:41Z

fern/observability/observability-framework.mdx

+
+---
+
+## Choosing your observability strategy


Composer shipped last week and should appear here as the recommended entry point for non-technical users. It can set up instrumentation, debug calls, and suggest fixes via natural language. This changes the "start simple" path significantly; instead of "configure Structured Outputs manually," it could be "tell Composer what you want to track."

evaz · 2026-02-26T06:30:21Z

fern/observability/instrumentation.mdx

+- You need queryable metrics for dashboards → use Structured Outputs
+- You need complex, domain-specific analysis → use Structured Outputs
+
+<span className="vapi-validation">Confirm Call Analysis status - is this truly legacy/deprecated, or actively supported alongside Structured Outputs? This will help to position accordingly.</span>


@sahilsuman933 can provide insights here

evaz · 2026-02-26T06:34:34Z

fern/observability/testing-strategies.mdx

+  timeline.
+</Warning>
+
+<span className="vapi-validation">Legacy Test Suites (voice and chat testing) have been replaced by Evals and Simulations. If you're using Test Suites, migrate to Evals (text-based) or Simulations (voice testing). Confirm deprecation status and migration path.</span>


@sahilsuman933 for migration path

evaz · 2026-02-26T06:36:02Z

fern/observability/testing-strategies.mdx

+
+---
+
+## Testing pyramid for voice AI


should fit Composer in here. Composer can pull call logs and explain what went wrong, which makes it a debugging companion alongside Evals. Maybe add as a "when tests fail" workflow: run Evals → test fails → use Composer to analyze the call and suggest a fix → update assistant → re-run Eval.

evaz · 2026-02-26T06:40:32Z

fern/observability/instrumentation.mdx

+
+Your instrumentation choices affect downstream observability:
+
+**Dashboard Native** (Boards-only monitoring):


I like the pattern taxonomy here of Dashboard Native / Hybrid / Webhook-to-External. It gives readers a clear mental model for choosing their approach. How would you suggest introducing new terms to Vapi's vocabulary and getting alignment?

evaz · 2026-02-26T06:41:15Z

fern/observability/monitoring.mdx

+  **Insights API is currently undocumented**. If you need flexible querying or programmatic alerting, contact Vapi support for guidance.
+</Warning>
+
+<span className="internal-note">Should Insights API be formally documented? What's the relationship between Insights API and Analytics API? Is Insights API the primary alerting mechanism, or are built-in alerts planned?</span>


@sahilsuman933

evaz · 2026-02-26T06:46:16Z

fern/observability/production-readiness.mdx

+
+## Production readiness gates
+
+Use these **gates** to decide if you're ready to progress:


these are awesome

evaz

Awesome mental model for devs built out here. The core thing here is that we shipped Composer since you started this work, and it can set up instrumentation, debug calls, and close the optimization loop via natural language. Will need to be woven into INSTRUMENT, TEST, and OPTIMIZE at min. other comments in-line!

workingmans-ai added 8 commits February 16, 2026 12:40

workingmans-ai self-assigned this Feb 17, 2026

workingmans-ai requested a review from sahilsuman933 February 17, 2026 17:29

workingmans-ai requested a review from evaz February 17, 2026 23:55

evaz reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workingmans/add observability framework guides#940

Workingmans/add observability framework guides#940
workingmans-ai wants to merge 8 commits intomainfrom
workingmans/add-observability-framework-guides

workingmans-ai commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz Feb 26, 2026

Uh oh!

evaz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Observability for voice AI means instrumenting your assistants to capture data, testing them before production, extracting insights from calls, monitoring operational health, and using that data to continuously improve.

		Unlike traditional software observability (logs, metrics, traces), voice AI observability must account for:


		Your instrumentation choices affect downstream observability:

		Dashboard Native (Boards-only monitoring):


		## Production readiness gates

		Use these gates to decide if you're ready to progress:

Conversation

workingmans-ai commented Feb 17, 2026

Summary

What's included

The 5-phase framework

Draft stages explained

Key questions for VAPI review

What's intentionally out of scope

Files changed

Testing Steps

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evaz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants