Azure AI Gateway Policy Engine

TL;DR

Azure AI Gateway Policy Engine — APIM-based AAA (Authentication, Authorization, Accounting) for AI workloads, inspired by telecom/RADIUS principles. Enterprise-ready solution for teams who need to authenticate, authorize, and account for AI API utilization with durable storage, intelligent routing, and adaptive billing.

A single ASP.NET Minimal API handles authentication/authorization pre-checks, model routing decisions, cost calculation, multiplier-based billing, and real-time dashboard streaming — orchestrated by .NET Aspire with CosmosDB as the source of truth and Redis as a high-performance cache.

🚀 Quick Deploy: git clone → ./scripts/setup-azure.ps1 → done

💰 What you get:

Authentication & authorization at the API gateway (APIM pre-checks)
Intelligent model routing to optimal deployments based on plan policies
Per-request multiplier pricing with monthly quotas and overage tracking
Durable configuration storage in CosmosDB with Redis caching
Adaptive billing UI that matches your billing model (token/multiplier/hybrid)
Bill-back reporting per client with tier breakdown and CSV export
WebSocket live dashboard with real-time metrics
Comprehensive test suite (198+ tests including routing benchmarks)

🏗️ Tech Stack: .NET 10, Azure Container Apps, ASP.NET Minimal APIs, .NET Aspire, React/TypeScript, Azure Managed Redis, CosmosDB, Azure API Management, Terraform, OpenTelemetry + Azure Monitor, Microsoft Purview

Quick Start

Local Development

# Clone and run (requires .NET 10 SDK + Docker for Redis)
git clone https://github.com/your-org/azure-ai-gateway-policy-engine.git
cd azure-ai-gateway-policy-engine/src
dotnet run --project AIPolicyEngine.AppHost

The Aspire dashboard opens automatically at https://localhost:17224 and provides resource status, logs, traces, and metrics for all services.

Azure Deployment

# Stage 1: Deploy infrastructure (uses placeholder container image)
cd infra/terraform
cp terraform.tfvars.sample terraform.tfvars  # Edit with your values
terraform init
terraform apply

# Stage 2: Build and deploy the container to the provisioned ACR
cd ../..
./scripts/deploy-container.ps1 -ResourceGroupName rg-aipolicy-eastus2

# For multi-tenant: provision SPs in secondary tenant
cd infra/terraform
./register-secondary-tenant.ps1 -SecondaryTenantId "<tenant-id>" `
    -ApiAppId "<api-app-id>" -Client2AppId "<client2-app-id>"

The two-stage approach is required because the container image depends on the ACR (created by Terraform), and enterprise environments typically block public container registries. The deploy-container.ps1 script builds the Docker image, pushes to the deployed ACR, and updates the Container App.

See the Deployment Guide for full manual steps.

Demo Client

# If you deployed with scripts/setup-azure.ps1, use the generated env file.
# DemoClient automatically loads .env.local/.env when present:
# demo/.env.local

# Or configure DemoClient via user secrets
dotnet user-secrets --project demo init
dotnet user-secrets --project demo set "DemoClient:TenantId" "<tenant-id>"
dotnet user-secrets --project demo set "DemoClient:ApiScope" "api://<gateway-app-id>/.default"
dotnet user-secrets --project demo set "DemoClient:ApimBase" "https://<apim-name>.azure-api.net"
dotnet user-secrets --project demo set "DemoClient:ApiVersion" "2025-04-01-preview"
dotnet user-secrets --project demo set "DemoClient:AIPolicyEngineBase" "https://<container-app-fqdn>"
dotnet user-secrets --project demo set "DemoClient:Clients:0:Name" "Enterprise Client"
dotnet user-secrets --project demo set "DemoClient:Clients:0:AppId" "<client-app-id>"
dotnet user-secrets --project demo set "DemoClient:Clients:0:Secret" "<client-secret>"
dotnet user-secrets --project demo set "DemoClient:Clients:0:Plan" "Enterprise"
dotnet user-secrets --project demo set "DemoClient:Clients:0:DeploymentId" "gpt-4.1"
dotnet user-secrets --project demo set "DemoClient:Clients:0:TenantId" "<tenant-id>"

# Generate synthetic traffic with Agent Framework (1.0.0-rc2) agents
dotnet run --project demo

Each demo client can specify its own TenantId. If omitted, it inherits the global DemoClient:TenantId. This allows testing multi-tenant scenarios where the same client app authenticates against different Entra tenants.

The Problem We Solve

Challenge	Impact	Our Solution
No per-tenant AI usage tracking	Cost overruns, no chargeback	✅ JWT claim extraction (`tid`, `aud`, `azp`) — billing keyed on client+tenant combination
SaaS apps serving multiple AI consumers	Single app ID can't distinguish tenants	✅ Combined `clientAppId:tenantId` customer key — each tenant gets independent quotas, rate limits, and billing
No quota or rate enforcement at the gate	Uncontrolled AI API spend	✅ Per-customer monthly quotas, request/token limits enforced before requests reach the backend
Weak identity & weak deployment control	No tenant isolation, overspend on wrong deployments	✅ Entra ID JWT bearer auth + model routing to optimal deployments based on plan policies
No configuration durability	Redis-only storage loses config on cache flush/upgrade	✅ CosmosDB as durable source of truth, Redis as high-speed write-through cache
No intelligent model routing	Manual per-client model lists, no auto-optimization	✅ Auto-router selects best deployment based on plan routing policies + Foundry discovery
Inflexible billing	Per-token billing doesn't match all use cases	✅ Per-request multiplier pricing (GHCP-style) with hybrid token+multiplier support
No real-time visibility	Delayed cost reporting, surprise bills	✅ WebSocket streaming + adaptive React dashboard that shows your billing model
Inconsistent deployment	Manual provisioning, environment drift	✅ Terraform IaC + .NET Aspire for reproducible local and cloud deployments

Architecture Overview

graph LR
    A[Client Apps] -->|Bearer Token| B[APIM Gateway]
    B -->|Inbound: Pre-check<br/>Auth + Route + Rate Limit| C[AIPolicyEngine.Api]
    B --> D[AI Models<br/>OpenAI, Foundry, etc]
    C --> E[Azure Managed Redis<br/>Hot Cache]
    C --> F[CosmosDB<br/>Source of Truth]
    C --> G[Azure Monitor]
    C --> H[Microsoft Purview]
    F --> E
    E --> I[React Dashboard]
    C -->|WebSocket| I

Core Architecture: A single Azure Container App (AIPolicyEngine.Api) hosts all gateway policy logic:

Layer	Technology	Purpose
API Gateway	Azure API Management (StandardV2)	Entra JWT validation, policy enforcement, traffic control
Backend Engine	ASP.NET 10 Minimal APIs	Authentication pre-checks, authorization decisions, model routing, billing calculations
Orchestration	.NET Aspire	Local development with all dependencies in containers
Hot Cache	Azure Managed Redis	Write-through cache for plans, clients, pricing, routing policies, rate limits, usage
Durable Store	Azure CosmosDB	Source of truth for all configuration; audit logs and billing records for compliance
Dashboard	React + TypeScript SPA	Five-page UI served from same container; real-time WebSocket updates
Observability	OpenTelemetry + Azure Monitor	Distributed tracing, custom metrics, structured logging
Compliance	Microsoft Purview	DLP policy validation and audit emission

Request & Decision Flow:

Client sends request to APIM with Bearer token (Entra ID JWT)
APIM validates token and extracts claims (tid, appid, aud)
APIM calls /api/precheck/{clientAppId}/{tenantId} — our engine checks:
- ✅ Plan assignment for this customer
- ✅ Model routing policy (selects optimal deployment)
- ✅ Monthly request/token quotas
- ✅ Rate limits (RPM/TPM) on routed deployment
- ✅ Deployment access (allowlist per plan/client)
If check fails → return 401/403/429, request blocked at APIM
If check passes → return routed deployment + token; APIM forwards to backend
Backend processes request
APIM outbound policy captures response, calls /api/log (fire-and-forget)
/api/log records usage, updates quotas, calculates cost with model multipliers

All decisions are cached in Redis for sub-millisecond latency; all configuration persists to CosmosDB for durability.

Key Features

🔐 Authentication & Authorization at the Gate

Dual identity provider support: Switch between Entra ID (Azure AD) and Keycloak via a single AuthProvider config toggle
Entra ID JWT bearer tokens with multi-tenant support
Keycloak OIDC with client_credentials flow for APIM service-to-service calls
Pre-check endpoint validates client, plan, deployment access, and quotas BEFORE backend receives the request
No subscription keys — pure identity-driven access control
Runtime auth config endpoint (/api/auth-config) — one container image works across all environments

🚀 Intelligent Model Routing (Auto-Router)

Automatically routes requests to optimal deployments based on plan routing policies
Integrates with AI model discovery services (e.g., Foundry endpoints)
Three routing modes:
- Per-account: Different routing rules per customer (clientAppId:tenantId)
- Enforced: Fallback routing when preferred deployment is unavailable
- QoS-based: Route based on load and quota availability
Rate limits (RPM/TPM) apply to the routed deployment, not the originally requested model

💰 Per-Request Multiplier Pricing (GHCP-style)

Flexible billing based on model tier, not just token count
Example: GPT-4.1 = 1.0x (baseline), GPT-4.1-mini = 0.33x
Every request costs: 1 × model_multiplier in effective requests
Monthly quotas tracked as "effective requests", not tokens
Hybrid mode: Mix token-based and multiplier-based models in the same plan

🗄️ CosmosDB Source of Truth + Redis Cache

Durable Configuration: All plans, clients, pricing, routing policies, and usage policies persist to CosmosDB
High-Performance Cache: Redis holds a write-through cache of all configuration for sub-millisecond lookups
Automatic Warmup: On startup, configuration is loaded from CosmosDB and populated into Redis
Cache Coherence: All writes go to CosmosDB first, then update Redis — no stale cache issues
36-Month Audit Trail: Cosmos DB stores all request logs and billing records for compliance and bill-back reporting

📊 Adaptive Billing Dashboard

React SPA that adapts to your billing model:
- Token-only plans: Shows token counts, token quotas, token usage charts
- Multiplier-only plans: Shows request counts, request quotas, model tier breakdowns
- Hybrid plans: Shows both billing modes side-by-side
No clutter — the UI hides irrelevant billing metrics based on plan configuration
Real-time WebSocket updates for live metrics

📋 Bill-Back Reporting

Per-client consumption tracking: Effective requests, token usage, model tier breakdown
Monthly billing exports: CSV downloads with tier-level detail
Audit trail per customer: Complete request log for any client+tenant combination
Role-based access: Export requires AIPolicyEngine.Export app role

⚡ APIM Policy Enforcement at the Gateway

Precheck-based authentication + routing + rate limiting
All quota and rate-limit checks happen BEFORE the backend request
Requests hitting rate limits are rejected at APIM (429 Too Many Requests)
Deployments not in the allowlist are rejected (403 Forbidden)

🧪 Comprehensive Test Suite

198+ tests covering critical paths, routing, pricing, quota enforcement
Routing benchmarks validating auto-router performance
Unit tests, integration tests, and end-to-end scenarios
Zero regressions through continuous validation

🏗️ Production-Ready Infrastructure

Terraform IaC — Modular infrastructure with six modules: monitoring, data, ai_services, identity, compute, and gateway
Azure Managed Redis — TLS-enabled, Entra-only auth, DDoS protection
Azure CosmosDB — Multi-region capable, point-in-time recovery, audit logging
Azure Container Apps — Managed Kubernetes, auto-scaling, managed identity
Aspire Orchestration — Single dotnet run for local development with all dependencies in containers

Dashboard

The React SPA provides five pages:

Dashboard — KPI cards (total cost, requests), top clients by utilization, usage charts, live request log, adaptive billing metrics
Clients — Client assignment management with plan badges, quota progress bars, deployment access overrides
Plans — Billing plan CRUD with:
- Monthly quota limits (request or token based)
- Rate limits (requests/min or tokens/min)
- Per-model multipliers (if using multiplier billing)
- Overbilling toggle for runover support
- Per-deployment quotas and access controls
- Routing policy assignment (auto-router configuration)
Routing Policies — Configure model routing rules:
- Map deployments per customer
- Set priority and fallback behavior
- Enable/disable rules without deletion
Export — Two report types:
- Billing Summary: Rolled-up per-client usage for a month
- Client Audit Trail: Every request for a specific client
- Requires AIPolicyEngine.Export app role

Authentication

The AI Policy Engine supports two identity providers, selected at runtime via the AuthProvider configuration value (AzureAd or Keycloak). The same container image works for both — no rebuild required.

Entra ID (Azure AD) — Default

Authentication uses Entra ID (Azure AD) App Registrations with JWT bearer tokens and a two-app model:

App Registration	Audience	Purpose
AIPolicyEngine APIM Gateway (multi-tenant)	`api://{gateway-app-id}`	External clients authenticate against this to call OpenAI via APIM
AIPolicyEngine API (single-tenant)	`api://{api-app-id}`	Backend API — only APIM's managed identity and dashboard users access this

The APIM policy (policies/entra-jwt-policy.xml) validates incoming tokens against the Gateway app audience and extracts claims for chargeback routing:

JWT Claim	Purpose
`tid`	Tenant ID — identifies the Entra directory / organization
`aud`	Audience — validates the token is intended for this API
`azp` / `appid`	Authorized Party — identifies the calling client application (`azp` for delegated tokens, `appid` for client_credentials)

Subscription keys are disabled. All authentication uses standard Authorization: Bearer <token> headers validated by the configured identity provider.

Keycloak OIDC — Alternative

To use Keycloak instead of Entra ID, set AuthProvider=Keycloak in the configmap/appsettings and configure the Keycloak section:

# Kubernetes configmap
AuthProvider: "Keycloak"
Keycloak__Authority: "https://keycloak.example.com/realms/myrealm"
Keycloak__Audience: "ai-policy-engine"
Keycloak__ClientId: "ai-policy-engine-api"        # Confidential client (backend)
Keycloak__FrontendClientId: "ai-policy-engine-ui"  # Public client (SPA)
Keycloak__Realm: "myrealm"
Keycloak__FrontendUrl: "https://keycloak.example.com"
Keycloak__RequireHttpsMetadata: "true"

Required Keycloak roles (create as realm roles or client roles):

Role	Purpose
`AIPolicy.Admin`	Create/edit/delete plans, clients, pricing, routing policies
`AIPolicy.Apim`	APIM service-to-service calls (precheck, log ingest)
`AIPolicy.Export`	Access to data export endpoints

Ensure roles are included in the roles claim of the access token via a Keycloak protocol mapper.

APIM integration: Use policies/keycloak-jwt-policy.xml instead of entra-jwt-policy.xml. Configure these APIM named values:

KeycloakOpenIdConfigUrl — OIDC discovery URL
KeycloakTokenEndpoint — Token endpoint for client_credentials
AIPolicyEngineApiBaseUrl — Backend API URL
AIPolicyEngineApimClientId / AIPolicyEngineApimClientSecret — APIM service account credentials

Multi-Tenant Customer Model

A "Customer" in the billing system is uniquely identified by the combination of clientAppId + tenantId — not just the client app alone. This enables scenarios where:

A SaaS company registers one Entra app (clientAppId) for their "Mobile App" but sells it to multiple organizations. Each customer's Entra tenant (tenantId) gets independent billing, quotas, and rate limits.
An internal platform team provisions a shared client app used by multiple departments. Each department's tenant gets its own usage tracking and cost allocation.
A single-tenant app works identically — the customer key is simply {clientAppId}:{tenantId} where tenantId is the app's home tenant.

All Redis keys, Cosmos DB partition keys, and dashboard views use this combined customer key:

Storage	Key Pattern	Example
Redis client	`client:{clientAppId}:{tenantId}`	`client:f5908fef-...9620:99e1e9a1-...c59a`
Redis usage logs	`log:{clientAppId}:{tenantId}:{deploymentId}`	`log:f5908fef-...9620:99e1e9a1-...c59a:gpt-4.1`
Redis rate limits	`ratelimit:rpm:{clientAppId}:{tenantId}:{window}`	`ratelimit:rpm:f5908fef-...9620:99e1e9a1-...c59a:45982`
Cosmos DB	Partition key: `/customerKey`	`f5908fef-...9620:99e1e9a1-...c59a`

Configuring Multi-Tenant Client Apps

Client 1 (single-tenant) — Registered as AzureADMyOrg. The tenant ID in the JWT tid claim always matches the deployment tenant. One customer entry.

Client 2 (multi-tenant) — Registered as AzureADMultipleOrgs. Can authenticate from any Entra tenant. Each unique tid creates a separate customer entry with its own plan, quota, and rate limits.

To register a multi-tenant client for multiple tenants:

# During deployment — registers Client 2 for a second tenant automatically
./scripts/setup-azure.ps1 -SecondaryTenantId "<other-tenant-guid>"

# Or manually via the API after deployment
curl -X PUT "https://<container-app>/api/clients/<clientAppId>/<tenantId>" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"planId": "<plan-id>", "displayName": "Acme Corp Mobile App"}'

Export Role Configuration

The export endpoints require the AIPolicyEngine.Export app role. To set this up:

Define the app role in the API app registration (Entra ID → App registrations → your API app → App roles):
- Display name: AIPolicyEngine Export
- Value: AIPolicyEngine.Export
- Allowed member types: Applications (for automated export via Function App/App Registration)
Assign the role to an App Registration for automated export:
- Go to Enterprise applications → find the App Registration's service principal
- App role assignments → Add assignment → select AIPolicyEngine.Export

Configure the API with the Entra ID settings in appsettings.json:

"AzureAd": {
  "Instance": "https://login.microsoftonline.com/",
  "TenantId": "<your-tenant-id>",
  "ClientId": "<your-api-app-client-id>",
  "Audience": "api://<your-api-app-client-id>"
}

The App Registration can then use client credentials flow to obtain a token and call export endpoints without access to admin endpoints (plans, clients, pricing, etc.).

API Endpoints

Endpoint	Method	Description
`/`	GET	React SPA dashboard
`/api/precheck/{clientAppId}/{tenantId}`	GET	Pre-check: verify customer plan, quotas, rate limits, deployment access, and routing decision
`/api/log`	POST	Log usage (called by APIM outbound policy) — updates quotas, calculates cost
`/api/routing-policies`	GET/POST/PUT/DELETE	Manage model routing policies (auto-router configuration)
`/api/deployments`	GET	List available AI model deployments from discovery service
`/api/usage`	GET	Aggregated usage summaries across all customers
`/api/logs`	GET	Individual request log entries with filtering/pagination
`/api/clients`	GET	List all customer (clientAppId:tenantId) assignments
`/api/clients/{clientAppId}/{tenantId}`	GET/PUT/DELETE	Customer assignment management
`/api/clients/{clientAppId}/{tenantId}/usage`	GET	Per-customer usage report with effective requests, tokens, tier breakdown
`/api/clients/{clientAppId}/{tenantId}/traces`	GET	Per-customer request traces and audit trail
`/api/plans`	GET/POST/PUT/DELETE	Billing plan CRUD (create, update, delete plans)
`/api/pricing`	GET/PUT/DELETE	Model pricing and multiplier management
`/api/quotas`	GET/PUT/DELETE	Per-client quota overrides (exceptions to plan limits)
`/api/export/available-periods`	GET	Available billing periods and clients for export (role: `AIPolicyEngine.Export`)
`/api/export/billing-summary`	GET	Monthly billing summary CSV for all customers (role: `AIPolicyEngine.Export`)
`/api/export/client-audit`	GET	Customer-specific audit trail CSV with per-model breakdown (role: `AIPolicyEngine.Export`)
`/ws/logs`	WS	WebSocket endpoint for real-time usage updates

Observability

Telemetry is configured via AIPolicyEngine.ServiceDefaults using the Azure Monitor OpenTelemetry distro:

Custom Metric	Description	Dimensions
`AIPolicyEngine.tokens_processed`	Total tokens processed across all requests	`tenant_id`, `client_app_id`, `model`, `deployment_id`
`AIPolicyEngine.cost_total`	Cumulative cost in USD	`tenant_id`, `client_app_id`, `model`, `deployment_id`
`AIPolicyEngine.requests_processed`	Total chargeback requests handled	`tenant_id`, `client_app_id`, `model`, `deployment_id`

Distributed traces, structured logs, and metrics flow to Azure Monitor / Application Insights. The Aspire dashboard provides a local development view of all telemetry. The infrastructure deployment also creates an Azure Monitor workbook dashboard (via the Terraform monitoring module) with 30-day token/cost trend, top clients, status distribution, and quota/rate-limit event views from Log Analytics.

Purview Integration

The AIPolicyEngine API integrates with the Microsoft Agent Framework Purview SDK for DLP policy validation and audit emission.

Requirements:

Microsoft 365 E5 license (or E5 Compliance add-on)
Entra App Registration with Microsoft Graph permissions:
- ProtectionScopes.Compute.All
- Content.Process.All
- ContentActivity.Write

Purview evaluates content against organizational DLP policies before processing and emits audit events for compliance reporting.

Project Structure

src/
├── AIPolicyEngine.slnx                    # Solution file
├── AIPolicyEngine.AppHost/                # .NET Aspire orchestrator — local dev environment with all services
├── AIPolicyEngine.ServiceDefaults/        # OpenTelemetry + Azure Monitor configuration, shared by all services
├── AIPolicyEngine.Api/                    # ASP.NET Minimal API — core policy engine
│   ├── Endpoints/                     # API endpoints (PreCheck, RoutingPolicy, RequestBilling, Export, etc.)
│   ├── Models/                        # Request/response DTOs, domain models
│   ├── Services/                      # ChargebackCalculator, RoutingEvaluator, Repositories (IRepository pattern)
│   ├── Repositories/                  # CosmosDB persistence layer (Plans, Clients, Pricing, Routing, etc.)
│   └── Program.cs                     # API configuration, dependency injection
├── AIPolicyEngine.Tests/                  # Unit and integration tests (198+ tests, 100% critical path coverage)
├── AIPolicyEngine.Benchmarks/             # BenchmarkDotNet performance tests (routing, pricing calculations)
├── AIPolicyEngine.LoadTest/               # Load testing project
└── AIPolicyEngine-ui/                     # React/TypeScript dashboard (5 pages, adaptive billing UI)
    ├── src/types.ts                   # TypeScript interfaces for API contracts
    ├── src/api.ts                     # API client with error handling
    ├── src/pages/                     # Dashboard, Clients, Plans, Routing, Export pages
    └── src/components/                # Shared UI components

demo/
├── DemoClient.cs                      # Agent Framework demo traffic generator (file-based app)
├── .env.sample                        # Environment variable template

policies/
├── entra-jwt-policy.xml               # APIM policy — JWT validation + precheck + log forwarding
└── subscription-key-policy.xml        # (Legacy) subscription key policy

infra/
└── terraform/                         # Terraform IaC modules (6 modules: monitoring, data, ai_services, identity, compute, gateway)
    ├── main.tf                        # Root orchestration
    ├── providers.tf                   # azurerm, azuread, azapi providers
    ├── variables.tf                   # Input variables
    ├── modules/                       # Modular infrastructure
    └── register-secondary-tenant.ps1  # Multi-tenant setup

scripts/
├── deploy-container.ps1               # Build and deploy container to ACR + verify Entra config
└── seed-data.ps1                      # Seed plans and client assignments into CosmosDB/Redis

docs/                                  # Documentation
├── ARCHITECTURE.md                    # Detailed system design
├── DOTNET_DEPLOYMENT_GUIDE.md         # Full deployment instructions
├── USAGE_EXAMPLES.md                  # API usage examples
├── FAQ.md                             # Frequently asked questions
└── README.md                          # (this file)

Infrastructure

Terraform modules are under infra/terraform/ with six modules: monitoring, data, ai_services, identity, compute, and gateway.

cd infra/terraform
cp terraform.tfvars.sample terraform.tfvars  # Edit with your values
terraform init
terraform apply

# Then build and deploy the container (also verifies Entra app config)
cd ../..
./scripts/deploy-container.ps1 -ResourceGroupName rg-aipolicy-eastus2

Note: The deploy-container.ps1 script automatically verifies and fixes Entra ID app configuration (identifier URIs, SPA redirect URIs, service principals, and role assignments for the deploying user). This works around known eventual-consistency issues in the AzureAD Terraform provider.

Prerequisites

✅ .NET 10 SDK
✅ Node.js 20+ (for the React dashboard)
✅ Azure subscription with appropriate permissions
✅ Azure CLI v2.50+
✅ Docker Desktop (for Aspire local orchestration — Redis runs in a container)
✅ Terraform >= 1.9 (for Terraform deployment path)

For Purview integration: Microsoft 365 E5 license + Entra App Registration with Graph permissions

Configuration

Environment Variables

# Required — Redis connection string (set automatically by Aspire in local dev)
ConnectionStrings__redis=your-redis-connection-string

# Optional — enables Azure Monitor telemetry export
APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=...

# Optional — enables Purview DLP/audit integration
PURVIEW_CLIENT_APP_ID=your-purview-app-id

# DemoClient configuration (alternative to user-secrets; auto-loaded from .env.local/.env)
# See demo/.env.sample for a copyable template.
DemoClient__TenantId=<tenant-id>
DemoClient__SecondaryTenantId=<optional-second-tenant-for-multi-tenant-demo>
DemoClient__ApiScope=api://<gateway-app-id>/.default
DemoClient__ApimBase=https://<apim-name>.azure-api.net
DemoClient__ApiVersion=2025-04-01-preview
DemoClient__AIPolicyEngineBase=https://<container-app-fqdn>
DemoClient__Clients__0__Name=Enterprise Client
DemoClient__Clients__0__AppId=<client-app-id>
DemoClient__Clients__0__Secret=<client-secret>
DemoClient__Clients__0__Plan=Enterprise
DemoClient__Clients__0__DeploymentId=gpt-4.1
DemoClient__Clients__0__TenantId=<tenant-id>

For local React dashboard development, copy src/AIPolicyEngine-ui/.env.sample to src/AIPolicyEngine-ui/.env.local and set your Entra app values.

When running via Aspire, connection strings and service discovery are configured automatically by the AIPolicyEngine.AppHost project.

Documentation

Topic	Link
Deployment Guide	docs/DOTNET_DEPLOYMENT_GUIDE.md
Setup Script	scripts/setup-azure.ps1
Demo Client	demo/
Benchmarks	src/AIPolicyEngine.Benchmarks/
Load Tests	src/AIPolicyEngine.LoadTest/

FAQ

❓ What's the max payload size? ✅ Limited only by the Azure Container App request size limit (default 100 MB).

❓ How long is data retained? ✅ Azure Managed Redis TTL is configurable (default 30 days). CosmosDB retains audit records for 36 months. Azure Monitor retains telemetry per your workspace retention settings.

❓ Multi-region support? ✅ Yes. Azure Container Apps and APIM both support multi-region deployment. Parameterize the Terraform modules for additional regions.

❓ How does multi-tenant billing work? ✅ A "customer" is the combination of clientAppId (the Entra app registration) and tenantId (the Entra directory). A multi-tenant app registered with AzureADMultipleOrgs can serve users from any Entra tenant — each tenant gets its own plan assignment, quota, rate limits, and billing. Use -SecondaryTenantId when deploying to pre-register a second tenant for the demo.

❓ What's multiplier pricing? ✅ Instead of charging per-token, multiplier pricing charges per-request with a model tier multiplier. Example: GPT-4.1 = 1.0x baseline, GPT-4.1-mini = 0.33x. Every 3 requests to mini count as 1 effective request against your monthly quota. This matches GHCP-style billing and works better for flat-rate consumption models.

❓ How does auto-routing work? ✅ The auto-router evaluates routing policies per customer and selects the best deployment based on availability and quotas. It doesn't rewrite what the client requested — if they ask for GPT-4, they still get GPT-4 (or a configured fallback if unavailable). Enforced rewriting (e.g., redirect GPT-4 → GPT-4o-mini) is a future policy engine feature.

❓ Do I need a Purview/E5 license? ✅ Only if you enable the Purview DLP integration. The core policy engine functionality works without it.

❓ How does deployment access control work? ✅ Plans and individual clients can have an allowlist of permitted AI model deployments. The precheck endpoint validates that the requested deployment is allowed before forwarding the request. Client-level overrides take precedence over plan-level settings.

❓ What if I have both token-based and multiplier-based plans? ✅ The dashboard adapts to show both billing modes side-by-side (hybrid mode). Each plan independently specifies its billing model. Clients on multiplier plans see multiplier-based quotas; clients on token plans see token-based quotas.

❓ Is all configuration stored durably? ✅ Yes. CosmosDB is the source of truth for all configuration: plans, clients, pricing, routing policies. Redis serves as a write-through cache for performance. On startup, configuration is automatically loaded from CosmosDB into Redis. You can safely upgrade or restart — nothing is lost.

❓ How do I deploy the infrastructure? ✅ Run terraform apply in infra/terraform/, then ./scripts/deploy-container.ps1 to build and push the container image. The two-stage approach is required because the container image depends on the ACR provisioned by Terraform.

📚 More questions: See the Deployment Guide for full environment variable reference and KQL queries.

Monitoring & Alerts

📊 Aspire Dashboard — Local development view of all traces, logs, and metrics
📈 Azure Monitor — Production telemetry via OpenTelemetry + Azure Monitor distro
🔍 Custom Metrics — AIPolicyEngine.tokens_processed, AIPolicyEngine.cost_total, AIPolicyEngine.requests_processed
🚨 Automated Alerts — Configure in Application Insights for cost thresholds and anomalies

Contributing

We welcome contributions! See our Contributing Guide for:

🛠️ Development setup
🧪 Testing guidelines
📋 Code standards
🔄 PR process

License

MIT License - see LICENSE for details.

🎯 Ready to get started? dotnet run --project src/AIPolicyEngine.AppHost

💡 Need help? Check our FAQ or the Deployment Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.aspire		.aspire
.github		.github
.squad		.squad
Tests		Tests
demo		demo
docs		docs
infra		infra
k8s		k8s
policies		policies
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
aspire.config.json		aspire.config.json
azure.yaml		azure.yaml
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Azure AI Gateway Policy Engine

TL;DR

Quick Start

Local Development

Azure Deployment

Demo Client

The Problem We Solve

Architecture Overview

Key Features

🔐 Authentication & Authorization at the Gate

🚀 Intelligent Model Routing (Auto-Router)

💰 Per-Request Multiplier Pricing (GHCP-style)

🗄️ CosmosDB Source of Truth + Redis Cache

📊 Adaptive Billing Dashboard

📋 Bill-Back Reporting

⚡ APIM Policy Enforcement at the Gateway

🧪 Comprehensive Test Suite

🏗️ Production-Ready Infrastructure

Dashboard

Authentication

Entra ID (Azure AD) — Default

Keycloak OIDC — Alternative

Multi-Tenant Customer Model

Configuring Multi-Tenant Client Apps

Export Role Configuration

API Endpoints

Observability

Purview Integration

Project Structure

Infrastructure

Prerequisites

Configuration

Environment Variables

Documentation

FAQ

Monitoring & Alerts

Contributing

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages