Skip to content

Conversation

@Ludobaka
Copy link

@Ludobaka Ludobaka commented Jan 23, 2026

This PR fixes an issue where end-of-turn (EOT) detection fails when using ElevenLabs' Scribe V2 STT model with the MultilingualModel turn detector.

Problem

When running an agent with the scribe_v2 model and turn detection enabled via MultilingualModel, EOT detection does not work. The turn detector rejects the language code returned by the STT service:

12:08:15.241 DEBUG  livekit.agents     received user transcript
                                       {"user_transcript": "Hi, I'm just testing things.", "language": "eng", ...}
12:08:15.245 INFO   livekit.agents     Turn detector does not support language eng {"room": "console"}

Root Cause

The EOT model expects the ISO 639-1 language code en, but ElevenLabs Scribe V2 returns the ISO 639-3 code eng (see ElevenLabs API reference).

Solution

Normalize the language code in livekit-agents/livekit/agents/voice/audio_recognition.py when setting self._last_language.

Why this approach?

Handling normalization at the audio_recognition.py level (rather than in the ElevenLabs plugin) ensures consistent behavior across all STT providers that may return non-standard language codes.

Of course any recommendations are welcome and I'll be glad to improve this proposal.

Summary by CodeRabbit

  • Bug Fixes
    • Normalize language codes in speech-to-text event handling (e.g., convert "eng" to "en") to ensure consistent formatting across transcript processing.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLAassistant commented Jan 23, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Language code normalization was introduced: in two STT event handlers ("FINAL_TRANSCRIPT" and "PREFLIGHT_TRANSCRIPT"), the language value "eng" is normalized to "en" before being stored in _last_language.

Changes

Cohort / File(s) Summary
Language Code Normalization
livekit-agents/livekit/agents/voice/audio_recognition.py
Normalize "eng""en" when assigning _last_language in FINAL_TRANSCRIPT and PREFLIGHT_TRANSCRIPT handlers; no other control-flow changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I hop through code with tiny feet,
I change "eng" to "en" — neat and sweet.
A little norm, a tidy mend,
So transcripts speak in code that’s friend. 🎧

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: normalizing language codes for ElevenLabs Scribe V2 STT model compatibility with EOT detection. The change directly addresses the core issue of ISO 639-3 to ISO 639-1 code conversion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d38116 and 7bd4ac9.

📒 Files selected for processing (1)
  • livekit-agents/livekit/agents/voice/audio_recognition.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: type-check (3.9)
  • GitHub Check: type-check (3.13)
  • GitHub Check: unit-tests

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 413-414: normalized_language is computed but never used: after
computing normalized_language ("en" if language == "eng" else language) the code
assigns self._last_language = language, so normalization isn't applied for
PREFLIGHT_TRANSCRIPT events; change the assignment to use normalized_language
instead (assign self._last_language = normalized_language) in the same block
where normalized_language is computed so downstream logic uses the normalized
code (refer to normalized_language and self._last_language in the audio
recognition handling code).
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)

353-354: Consider a more robust language code normalization approach.

The current fix only handles "eng""en", but other STT providers (or ElevenLabs for other languages) may return additional ISO 639-3 codes (e.g., "spa", "fra", "deu"). A utility function or library like langcodes or pycountry would provide comprehensive normalization.

♻️ Suggested helper function
# Could be added to a utils module
def normalize_language_code(language: str | None) -> str | None:
    """Normalize ISO 639-3 codes to ISO 639-1 where applicable."""
    if language is None:
        return None
    # Common ISO 639-3 to ISO 639-1 mappings
    ISO_639_3_TO_1 = {
        "eng": "en",
        "spa": "es",
        "fra": "fr",
        "deu": "de",
        "ita": "it",
        "por": "pt",
        "rus": "ru",
        "zho": "zh",
        "jpn": "ja",
        "kor": "ko",
        # Add more as needed
    }
    return ISO_639_3_TO_1.get(language, language)

Then usage becomes:

-                normalized_language = "en" if language == "eng" else language
-                self._last_language = normalized_language
+                self._last_language = normalize_language_code(language)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and 3d38116.

📒 Files selected for processing (1)
  • livekit-agents/livekit/agents/voice/audio_recognition.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-agents/livekit/agents/voice/audio_recognition.py
🧠 Learnings (1)
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

  • livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

language and len(transcript) > MIN_LANGUAGE_DETECTION_LENGTH
):
self._last_language = language
normalized_language = "en" if language == "eng" else language
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps move this to the plugin of the 11labs?

Copy link
Author

@Ludobaka Ludobaka Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought as well, but I went with this approach for resilience: any other STT service that returns eng instead of en would be handled automatically. It also opens the way to use a normalizing function later in case other languages have the issue.

That said, I'm happy to move the normalization to the ElevenLabs plugin if you think that's the more appropriate place for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants