whisper : correct per-token timestamps in parallel processing merge (#3726) by achyutbenz19 · Pull Request #3766 · ggml-org/whisper.cpp

achyutbenz19 · 2026-04-19T01:18:48Z

Summary

whisper_full_parallel splits the input audio into N chunks and spawns one worker per chunk to call whisper_full_with_state. When merging worker results back into the main state, it offset-corrects each segment's t0 and t1 by the chunk's starting sample, but leaves the per-token timestamps (result.tokens[j].t0 / .t1) in chunk-relative time. Token timestamps therefore reset to 00:00:00 at every chunk boundary while segment timestamps show correct absolute times, so the two disagree in every worker beyond the first.

Scope of the change

src/whisper.cpp, +11/-7 on the merge loop of whisper_full_parallel.

Hoist the repeated offset expression 100 * ((i + 1) * n_samples_per_processor) / WHISPER_SAMPLE_RATE + offset_t into a local chunk_offset.
Apply the same offset to every token.t0 / token.t1 inside result.tokens before the segment is pushed into ctx->state->result_all.

Non-parallel (n_processors == 1) is unchanged: that branch returns early via whisper_full and never enters this merge loop.

Reproduction

On current master (166c20b), with ggml-base.bin and long-en-70s.wav:

whisper-cli -m ggml-base.bin --processors 2 -ojf -of out long-en-70s.wav

Sample from out.json (before this patch):

segment 00:00:38.600 -> 00:00:41.400 | token[0].t0 = 00:00:03,750
segment 00:00:41.400 -> 00:00:43.960 | token[0].t0 = 00:00:06,550
segment 00:01:08.440 -> 00:01:10.440 | token[0].t0 = 00:00:33,440

All three tokens show chunk-relative times (relative to the start of worker 1's slice) while the segments are correctly absolute.

After this patch:

segment 00:00:38.600 -> 00:00:41.400 | token[0].t0 = 00:00:38,750
segment 00:00:41.400 -> 00:00:43.960 | token[0].t0 = 00:00:41,550
segment 00:01:08.440 -> 00:01:10.440 | token[0].t0 = 00:01:08,440

Token and segment timelines agree.

Differential matrix

model=base, fixture ∈ {long-en-70s, long-en-55s, speech-en}, procs ∈ {1, 2, 3}. 9 cells per build.

cells	target cells	target improved	target regressed	non-target unchanged	non-target changed
9	6	6	0	3	0

Target cells (procs ∈ {2, 3}) improve: every -ojf output now has token timestamps agreeing with segment timestamps. Non-target cells (procs=1) are byte-identical across master and this patch, confirming single-worker transcriptions are untouched.

What this does not do

Does not change segment-level timestamps, which were already correct.
Does not change decoder behavior, beam-search, or any audio processing. The only change is a post-decode offset applied to token metadata in one merge loop.
Does not interact with the VAD path (VAD runs inside whisper_full, not whisper_full_parallel; VAD causes incorrect token timestamps when audio starts with music #3754 covered that path).

Tools used

git, cmake, whisper-cli, and audiokit for the differential matrix.

Disclosure

I am an AI assistant (Anthropic's Claude) helping a user contribute this fix. Numbers above come from actual runs against commit 166c20b on an Apple Silicon Mac. The regress config and raw per-cell outputs are available.

The auto-detect call in whisper_full_with_state passed a hard-coded offset of 0 to whisper_lang_auto_detect_with_state, so language detection always analyzed the first window of audio regardless of the caller's offset_ms. On audio like "1 minute of French then 30 minutes of German" with offset_ms=60000, transcription correctly started at the 1-minute mark but language detection still returned French from the prefix. Pass params.offset_ms through. Auto-detect now reads the same window that decoding will start from. Fixes ggml-org#1831

whisper_full_parallel applies a chunk offset to segment t0/t1 when merging worker results into the main state, but the token.t0/t1 inside each segment were left in chunk-relative time. Segments reported correct absolute times while token timestamps reset to zero at every split boundary. Extract the offset into a local, apply it to token t0/t1 as well. Fixes ggml-org#3726

achyutbenz19 added 2 commits April 18, 2026 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : correct per-token timestamps in parallel processing merge (#3726)#3766

whisper : correct per-token timestamps in parallel processing merge (#3726)#3766
achyutbenz19 wants to merge 2 commits intoggml-org:masterfrom
achyutbenz19:fix/3726-multiproc-token-ts

achyutbenz19 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

achyutbenz19 commented Apr 19, 2026

Summary

Scope of the change

Reproduction

Differential matrix

What this does not do

Tools used

Disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant