Skip to content

Workingmans/add observability framework guides#940

Open
workingmans-ai wants to merge 8 commits intomainfrom
workingmans/add-observability-framework-guides
Open

Workingmans/add observability framework guides#940
workingmans-ai wants to merge 8 commits intomainfrom
workingmans/add-observability-framework-guides

Conversation

@workingmans-ai
Copy link
Collaborator

Summary

This PR adds a new Observability Guides section to the Vapi documentation — seven new pages that give developers a systematic, framework-level view of how to instrument, test, extract, monitor, and optimize voice AI assistants in production.

These guides complement the existing deep-dive pages (Evals, Scorecards, Boards, Call Analysis) with the strategic context developers need to know which tool to use, when, and why.


What's included

7 new MDX pages under fern/observability/:

Page Status Description
observability-framework.mdx Rough Draft Framework overview with the 5-phase maturity model
instrumentation.mdx Rough Draft INSTRUMENT phase — built-in hooks, Structured Outputs, Call Analysis
testing-strategies.mdx Rough Draft TEST phase — testing pyramid, Evals vs Simulations decision guide
extraction-patterns.mdx Rough Draft EXTRACT phase — three extraction architecture patterns and tradeoffs
monitoring.mdx Skeleton Draft MONITOR phase — Boards, Analytics API, Insights API, Langfuse, alerting
optimization-workflows.mdx Skeleton Draft OPTIMIZE phase — optimization loop, common scenarios
production-readiness.mdx Skeleton Draft Cross-phase launch checklist with readiness gates

Navigation (fern/docs.yml): New "Guides" sub-section under Observability linking all seven pages.

CSS (fern/assets/styles.css): Two editorial annotation styles:

  • internal-note (purple) — process notes and TODOs visible in the Fern preview
  • vapi-validation (yellow-green) — questions requiring VAPI input to verify accuracy before publish

The 5-phase framework

The framework introduces a staged maturity model:

INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE

Each stage guide explains what the phase accomplishes, which Vapi tools map to it, and links forward to the relevant deep-dive feature docs.


Draft stages explained

Pages are marked at one of two stages:

  • Rough Draft — Full prose written; structure and claims need VAPI validation
  • Skeleton Draft — Structure and scope established; detailed content follows VAPI feedback in iteration 2

The yellow-green [VAPI VALIDATION NEEDED] annotations call out specific factual questions embedded throughout the pages — things like API capability scope, feature status (active vs. legacy), and whether the framework framing aligns with how the VAPI team thinks about observability.


Key questions for VAPI review

These are representative of the embedded validation questions — answers unblock content accuracy in the next iteration:

  1. Call Analysis status — Is Call Analysis actively supported alongside Structured Outputs, or effectively legacy? This affects positioning in the Instrumentation guide.
  2. Analytics API vs Insights API — What are the distinct use cases for each? When should a developer choose one over the other? (Monitoring guide)
  3. Framework framing — Does the INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE model match how Vapi thinks about observability? Are there phases missing or named differently internally?
  4. Extraction patterns — Are the three extraction patterns (Structured Outputs, Call Analysis, external integrations) accurate and complete?
  5. Production readiness gates — Does Vapi have internal launch criteria or customer-facing readiness standards we should align the checklist to?

What's intentionally out of scope

  • Deeper feature guides (Evals, Scorecards, Boards, Simulations) — existing or planned in separate work
  • API reference content — these are conceptual/workflow guides only
  • Finalized content on Skeleton Draft pages — that iteration follows VAPI feedback on this draft

Files changed

fern/observability/observability-framework.mdx   (new, 224 lines)
fern/observability/instrumentation.mdx           (new, 303 lines)
fern/observability/testing-strategies.mdx        (new, 181 lines)
fern/observability/extraction-patterns.mdx       (new, 366 lines)
fern/observability/monitoring.mdx                (new, 151 lines)
fern/observability/optimization-workflows.mdx    (new, 177 lines)
fern/observability/production-readiness.mdx      (new, 408 lines)
fern/docs.yml                                    (navigation additions)
fern/assets/styles.css                           (editorial annotation CSS)

Testing Steps

  • Run the app locally using fern docs dev or navigate to preview deployment
  • Ensure that the changed pages and code snippets work

Add three new CSS classes for inline documentation annotations:
- .internal-note: Purple styling for internal development notes
- .vapi-validation: Orange styling for questions requiring VAPI validation
- .claude-note: Green styling for implementation guidance notes

Each class includes automatic label injection via ::before pseudo-elements
and dark mode variants for readability.
Add new "Guides" section under Observability with 7 pages:
- Framework (observability-framework.mdx, renamed from overview)
- Instrumentation
- Testing strategies
- Extraction patterns
- Monitoring & Operating
- Optimization workflows
- Production readiness

Removed Integration Limitations page from navigation.
Add top-level framework guide introducing the observability maturity model
for voice AI assistants (renamed from overview.mdx).

Key sections:
- What is observability for voice AI
- Five-stage maturity model (INSTRUMENT → TEST → EXTRACT → MONITOR → OPTIMIZE)
- Stage descriptions with tool mapping
- Progressive adoption guidance
- Cross-stage workflow examples

Includes VAPI validation questions for framing and terminology.
Add two guides covering the "build & validate" stages:

**Instrumentation guide:**
- Built-in vs custom instrumentation concepts
- Purpose and intended outcomes for each type
- Tools at a glance (Built-in, Structured Outputs, Call Analysis)
- When to use each instrumentation approach

**Testing strategies guide:**
- Voice AI testing challenges
- Tools comparison (Evals vs Simulations vs Test Suites)
- Testing pyramid for voice AI
- Recommended hybrid testing strategy

Both pages use skeleton format with full prose intros and placeholder
sections for detailed content, pending VAPI validation.
Add two guides covering data extraction architecture and deployment validation:

**Extraction patterns guide:**
- Three extraction patterns at a glance (Dashboard Native, Webhook-to-External, Hybrid)
- Pattern descriptions with architectural trade-offs
- Feature mapping to patterns (Structured Outputs, Scorecards, APIs, Langfuse)
- When to use each pattern
- Schema design implications
- Migration paths between patterns

**Production readiness guide:**
- Progressive validation approach (INSTRUMENT+TEST → EXTRACT+MONITOR → OPTIMIZE)
- Stage-by-stage checklist with required and recommended items
- Production readiness gates (first deploy, scaled deploy, mature observability)
- Common readiness mistakes and fixes
- Deployment workflow timeline

Includes VAPI validation questions throughout for pattern accuracy and naming consistency.
Add two guides covering the "monitor & improve" stages (skeleton format):

**Monitoring & Operating guide:**
- Reframed title from "Monitoring" to "Monitoring & Operating"
- Operating voice AI systems introduction (real-time performance, cost, quality)
- Tools at a glance (Boards, Insights API, Analytics API, Langfuse, Webhook-to-External)
- Placeholder sections for tool details, alerting strategies, best practices
- Focus on operational reliability and continuous visibility

**Optimization workflows guide:**
- Optimization as continuous improvement loop (not a dedicated tool)
- 7-step workflow (Detect → Extract → Hypothesize → Change → Test → Deploy → Verify)
- Optimization mindset and why it matters
- Placeholder sections for detailed steps, common scenarios, best practices
- Cross-functional workflow using tools from all previous stages

Both pages use skeleton format with complete intros and VAPI validation questions,
awaiting tool clarification and detailed content development in iteration 2.
Add internal-note banners indicating completion status for VAPI reviewers:

Rough Draft (3 pages - content present, needs refinement):
- observability-framework.mdx
- instrumentation.mdx
- testing-strategies.mdx

Skeleton Draft (3 pages - structure only, detailed content pending):
- production-readiness.mdx (iteration 3)
- monitoring.mdx (iteration 2)
- optimization-workflows.mdx (iteration 2)

This helps reviewers calibrate expectations for which pages are ready
for content review vs. structural/architectural review only.
Changed "stage" to "phase" throughout observability framework to better
reflect the non-linear, iterative nature of the model. Phases can be
revisited and worked on concurrently, unlike sequential stages.

Changes:
- Framework page: Updated all section headings from "Stage X:" to "Phase"
  format, removed numbering from navigation cards, updated prose
- All 5 phase guides: Added phase context to subtitle frontmatter
  (e.g., "This is the INSTRUMENT phase of the observability framework")
- Removed numbered stage references throughout

Also includes from earlier consistency review:
- Framework: Added Test Suites deprecated label, Simulations pre-release
- Instrumentation: Removed Call Analysis recommendation, reordered nav cards,
  added back-link, added inter-stage bridge, removed decorative emoji
- Testing strategies: Added prerequisite reference to instrumentation
- Extraction patterns: Removed decorative emojis from comparison table
@github-actions
Copy link
Contributor

@workingmans-ai workingmans-ai requested a review from evaz February 17, 2026 23:55

Observability for voice AI means **instrumenting your assistants to capture data**, **testing them before production**, **extracting insights from calls**, **monitoring operational health**, and **using that data to continuously improve**.

Unlike traditional software observability (logs, metrics, traces), voice AI observability must account for:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framing is solid. Few things worth considering adding:

  • Latency needs more callout: Voice has a ~500ms awkward silence threshold that text AI doesn't -- and STT + LLM + TTS + telephony latency all compound. A 200ms regression in one layer can tank the whole conversation.
  • Interruptions/turn-taking: sometimes an assistant might cut off the user or get steamrolled mid-sentence. These are voice-specific failure modes with no chat equivalent, worth calling out explicitly.
  • Cost attribution: enterprises running thousands of concurrent calls need to know which assistant version or use case is driving spend.
  • Maybe reframe "Quality is subjective" to "Quality requires new metrics" with interruption rate, silence duration, task completion, etc. Frames it as a solvable problem for the dev reading this.


---

## Choosing your observability strategy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composer shipped last week and should appear here as the recommended entry point for non-technical users. It can set up instrumentation, debug calls, and suggest fixes via natural language. This changes the "start simple" path significantly; instead of "configure Structured Outputs manually," it could be "tell Composer what you want to track."

- You need queryable metrics for dashboards → use Structured Outputs
- You need complex, domain-specific analysis → use Structured Outputs

<span className="vapi-validation">Confirm Call Analysis status - is this truly legacy/deprecated, or actively supported alongside Structured Outputs? This will help to position accordingly.</span>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sahilsuman933 can provide insights here

timeline.
</Warning>

<span className="vapi-validation">Legacy Test Suites (voice and chat testing) have been replaced by Evals and Simulations. If you're using Test Suites, migrate to Evals (text-based) or Simulations (voice testing). Confirm deprecation status and migration path.</span>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sahilsuman933 for migration path


---

## Testing pyramid for voice AI
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should fit Composer in here. Composer can pull call logs and explain what went wrong, which makes it a debugging companion alongside Evals. Maybe add as a "when tests fail" workflow: run Evals → test fails → use Composer to analyze the call and suggest a fix → update assistant → re-run Eval.


Your instrumentation choices affect downstream observability:

**Dashboard Native** (Boards-only monitoring):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the pattern taxonomy here of Dashboard Native / Hybrid / Webhook-to-External. It gives readers a clear mental model for choosing their approach. How would you suggest introducing new terms to Vapi's vocabulary and getting alignment?

**Insights API is currently undocumented**. If you need flexible querying or programmatic alerting, contact Vapi support for guidance.
</Warning>

<span className="internal-note">Should Insights API be formally documented? What's the relationship between Insights API and Analytics API? Is Insights API the primary alerting mechanism, or are built-in alerts planned?</span>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## Production readiness gates

Use these **gates** to decide if you're ready to progress:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are awesome

Copy link

@evaz evaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome mental model for devs built out here. The core thing here is that we shipped Composer since you started this work, and it can set up instrumentation, debug calls, and close the optimization loop via natural language. Will need to be woven into INSTRUMENT, TEST, and OPTIMIZE at min. other comments in-line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants