add note on performance and latency

steven10a · steven10a · commit fe3ee1a4ba85 · 2025-12-12T16:39:22.000-05:00
diff --git a/docs/ref/checks/custom_prompt_check.md b/docs/ref/checks/custom_prompt_check.md
@@ -23,7 +23,8 @@ Implements custom content checks using configurable LLM prompts. Uses your custo
 - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
     - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
     - When `true`: Additionally, returns detailed reasoning for its decisions
-    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ## Implementation Notes
 
diff --git a/docs/ref/checks/hallucination_detection.md b/docs/ref/checks/hallucination_detection.md
@@ -28,7 +28,8 @@ Flags model text containing factual claims that are clearly contradicted or not
 - **`include_reasoning`** (optional): Whether to include detailed reasoning fields in the output (default: `false`)
     - When `false`: Returns only `flagged` and `confidence` to save tokens
     - When `true`: Additionally, returns `reasoning`, `hallucination_type`, `hallucinated_statements`, and `verified_statements`
-    - Recommended: Keep disabled for production (default); enable for development/debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ### Tuning guidance
 
diff --git a/docs/ref/checks/jailbreak.md b/docs/ref/checks/jailbreak.md
@@ -46,7 +46,8 @@ Detects attempts to bypass safety or policy constraints via manipulation (prompt
 - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
     - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
     - When `true`: Additionally, returns detailed reasoning for its decisions
-    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ### Tuning guidance
 
diff --git a/docs/ref/checks/llm_base.md b/docs/ref/checks/llm_base.md
@@ -22,7 +22,8 @@ Base configuration for LLM-based guardrails. Provides common configuration optio
 - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
   - When `true`: The LLM generates and returns detailed reasoning for its decisions (e.g., `reason`, `reasoning`, `observation`, `evidence` fields)
   - When `false`: The LLM only returns the essential fields (`flagged` and `confidence`), reducing token generation costs
-  - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
+  - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+  - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ## What It Does
 
diff --git a/docs/ref/checks/nsfw.md b/docs/ref/checks/nsfw.md
@@ -32,7 +32,8 @@ Flags workplace‑inappropriate model outputs: explicit sexual content, profanit
 - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
     - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
     - When `true`: Additionally, returns detailed reasoning for its decisions
-    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ### Tuning guidance
 
diff --git a/docs/ref/checks/off_topic_prompts.md b/docs/ref/checks/off_topic_prompts.md
@@ -23,7 +23,8 @@ Ensures content stays within defined business scope using LLM analysis. Flags co
 - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
     - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
     - When `true`: Additionally, returns detailed reasoning for its decisions
-    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 ## Implementation Notes
 
diff --git a/docs/ref/checks/prompt_injection_detection.md b/docs/ref/checks/prompt_injection_detection.md
@@ -44,7 +44,8 @@ After tool execution, the prompt injection detection check validates that the re
 - **`include_reasoning`** (optional): Whether to include the `observation` and `evidence` fields in the output (default: `false`)
     - When `true`: Returns detailed `observation` explaining what the action is doing and `evidence` with specific quotes/details
     - When `false`: Omits reasoning fields to save tokens (typically 100-300 tokens per check)
-    - Recommended: Keep disabled for production (default); enable for development/debugging
+    - **Performance**: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
+    - **Use Case**: Keep disabled for production to minimize costs and latency; enable for development and debugging
 
 **Flags as MISALIGNED:**