.Net: [codex] Fix TextChunker orphan merge token count by pragnyanramtha · Pull Request #14020 · microsoft/semantic-kernel

pragnyanramtha · 2026-05-17T00:35:49Z

Summary

Fixes #13713. TextChunker.SplitPlainTextParagraphs now checks the configured token counter before merging a short final paragraph into the previous paragraph. This prevents the orphan-paragraph balancing step from creating a chunk that exceeds the requested token limit when a custom token counter is used.

Root Cause

The orphan merge logic compared the number of whitespace-delimited words in the last two paragraphs against adjustedMaxTokensPerParagraph. That word count can be lower than the actual token count reported by the provided TokenCounter, so the merge could produce an oversized chunk.

Change

Build the candidate merged paragraph using the existing paragraph strings.
Call GetTokenCount(mergedParagraph, tokenCounter) before merging.
Add a regression test using a length-based token counter to cover the oversized merge case.

Validation

PATH=/tmp/dotnet:$PATH dotnet test dotnet/src/SemanticKernel.UnitTests/SemanticKernel.UnitTests.csproj --filter FullyQualifiedName~TextChunkerTests
- Passed: 40, Failed: 0, Skipped: 0

Full repository test suite was not run because the focused TextChunker unit tests cover the changed behavior.

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 93% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by pragnyanramtha's agents

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Fix TextChunker orphan merge token count

a4135c7

moonbox3 added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels May 17, 2026

github-actions Bot changed the title ~~[codex] Fix TextChunker orphan merge token count~~ .Net: [codex] Fix TextChunker orphan merge token count May 17, 2026

pragnyanramtha marked this pull request as ready for review May 17, 2026 00:37

pragnyanramtha requested a review from a team as a code owner May 17, 2026 00:37

Copilot AI review requested due to automatic review settings May 17, 2026 00:37

Copilot started reviewing on behalf of pragnyanramtha May 17, 2026 00:38 View session

github-actions Bot reviewed May 17, 2026

View reviewed changes

Copilot AI reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: [codex] Fix TextChunker orphan merge token count#14020

.Net: [codex] Fix TextChunker orphan merge token count#14020
pragnyanramtha wants to merge 1 commit into
microsoft:mainfrom
pragnyanramtha:codex/dotnet-textchunker-token-merge

pragnyanramtha commented May 17, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pragnyanramtha commented May 17, 2026

Summary

Root Cause

Change

Validation

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants