Support multi-turn for all LLM based guardrails #55

steven10a · 2025-12-12T22:30:50Z

Extending llm_base.py to always use the conversation_history from ctx to provide the conversation history to all LLM based guardrails. Previously we had the Jailbreak guardrail as a custom multi-turn guardrail.

Conversation history will allow for more robust detection
User can set max_turns in the config to control how much of the conversation is passed to the guardrail, balancing token cost with context
Updated documentation
Updated and added tests

Copilot

Pull request overview

This PR extends multi-turn conversation support from the Jailbreak guardrail to all LLM-based guardrails. It centralizes conversation history handling in llm-base.ts, introduces configurable max_turns and include_reasoning parameters, and refactors existing guardrails to use this unified infrastructure. The changes enable more robust detection across all LLM guardrails while giving users control over token costs through configurable reasoning fields and conversation history limits.

Key Changes:

Unified multi-turn support infrastructure in llm-base.ts with automatic conversation history extraction and configurable turn limits
Added include_reasoning config to control whether detailed explanation fields are included in outputs (defaults to false for production cost savings)
Added max_turns config to limit conversation history size (defaults to 10 turns)

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/checks/llm-base.ts`	Core infrastructure: added `extractConversationHistory`, `buildAnalysisPayload`, `LLMReasoningOutput` schema, `include_reasoning` and `max_turns` config fields; updated `runLLM` and `createLLMCheckFn` to support multi-turn analysis
`src/checks/jailbreak.ts`	Refactored to use `createLLMCheckFn` instead of custom implementation, removing duplicate conversation history logic
`src/checks/prompt_injection_detection.ts`	Added conditional reasoning field inclusion based on `include_reasoning`; integrated `max_turns` parameter for conversation slicing
`src/checks/hallucination-detection.ts`	Added conditional reasoning field inclusion with explicit field listing pattern
`src/checks/user-defined-llm.ts`	Updated to use automatic reasoning handling via `createLLMCheckFn`
`src/checks/topical-alignment.ts`	Updated to use automatic reasoning handling via `createLLMCheckFn`
`src/checks/nsfw.ts`	Updated to use automatic reasoning handling via `createLLMCheckFn`
`src/checks/moderation.ts`	Minor cleanup: removed `checked_text` field from one error path
`src/__tests__/unit/llm-base.test.ts`	Comprehensive tests for new helper functions, reasoning control, and multi-turn behavior
`src/__tests__/unit/prompt_injection_detection.test.ts`	Tests for `include_reasoning` and `max_turns` configuration options
`src/__tests__/unit/checks/jailbreak.test.ts`	Updated tests to reflect refactored implementation using `createLLMCheckFn`
`src/__tests__/unit/checks/hallucination-detection.test.ts`	New comprehensive test file for reasoning control and error handling
`src/__tests__/unit/checks/user-defined-llm.test.ts`	Updated test to verify `include_reasoning` functionality
`examples/basic/hello_world.ts`	Demonstrates `include_reasoning: true` in example configuration
`docs/ref/checks/*.md`	Comprehensive documentation updates across all LLM guardrail docs explaining new `include_reasoning` and `max_turns` parameters with consistent performance claims
`.gitignore`	Added `PR_READINESS_CHECKLIST.md` to ignored files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/checks/moderation.ts

steven10a · 2025-12-12T22:45:03Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/checks/prompt_injection_detection.ts

steven10a · 2025-12-12T22:55:21Z

@codex review

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector · 2025-12-12T23:00:11Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…v/steven/multi_turn

steven10a added 5 commits December 10, 2025 17:12

parameterize LLM returning reasoning

628aa82

Preserve reason field in error fallback message

715df1c

Making new param optional

4bda9b6

Fix prompt injection reporting errors

b1a75bc

Adding multi-turn support to all LLM based guardrails

0238099

Copilot AI review requested due to automatic review settings December 12, 2025 22:30

steven10a changed the title ~~Dev/steven/multi turn~~ Support multi-turn for all LLM based guardrails Dec 12, 2025

Copilot started reviewing on behalf of steven10a December 12, 2025 22:31 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

src/checks/moderation.ts Show resolved Hide resolved

steven10a added 2 commits December 12, 2025 17:42

Update tests

4d21c17

Remove unneeded field

82d69b8

steven10a requested a review from gabor-openai December 12, 2025 22:45

chatgpt-codex-connector bot reviewed Dec 12, 2025

View reviewed changes

src/checks/prompt_injection_detection.ts Show resolved Hide resolved

better error handling for prompt injection

ad9f398

steven10a requested a review from Copilot December 12, 2025 22:55

Copilot started reviewing on behalf of steven10a December 12, 2025 22:55 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Merge branch 'main' of github.com:openai/openai-guardrails-js into de…

920c717

…v/steven/multi_turn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multi-turn for all LLM based guardrails #55

Support multi-turn for all LLM based guardrails #55

Uh oh!

steven10a commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

steven10a commented Dec 12, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

steven10a commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support multi-turn for all LLM based guardrails #55

Are you sure you want to change the base?

Support multi-turn for all LLM based guardrails #55

Uh oh!

Conversation

steven10a commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

steven10a commented Dec 12, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

chatgpt-codex-connector bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants