-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix: 11Labs Scribe v2 model not working with EOT prediction model #4601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughLanguage code normalization was introduced: in two STT event handlers ("FINAL_TRANSCRIPT" and "PREFLIGHT_TRANSCRIPT"), the language value "eng" is normalized to "en" before being stored in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 413-414: normalized_language is computed but never used: after
computing normalized_language ("en" if language == "eng" else language) the code
assigns self._last_language = language, so normalization isn't applied for
PREFLIGHT_TRANSCRIPT events; change the assignment to use normalized_language
instead (assign self._last_language = normalized_language) in the same block
where normalized_language is computed so downstream logic uses the normalized
code (refer to normalized_language and self._last_language in the audio
recognition handling code).
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)
353-354: Consider a more robust language code normalization approach.The current fix only handles
"eng"→"en", but other STT providers (or ElevenLabs for other languages) may return additional ISO 639-3 codes (e.g.,"spa","fra","deu"). A utility function or library likelangcodesorpycountrywould provide comprehensive normalization.♻️ Suggested helper function
# Could be added to a utils module def normalize_language_code(language: str | None) -> str | None: """Normalize ISO 639-3 codes to ISO 639-1 where applicable.""" if language is None: return None # Common ISO 639-3 to ISO 639-1 mappings ISO_639_3_TO_1 = { "eng": "en", "spa": "es", "fra": "fr", "deu": "de", "ita": "it", "por": "pt", "rus": "ru", "zho": "zh", "jpn": "ja", "kor": "ko", # Add more as needed } return ISO_639_3_TO_1.get(language, language)Then usage becomes:
- normalized_language = "en" if language == "eng" else language - self._last_language = normalized_language + self._last_language = normalize_language_code(language)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-agents/livekit/agents/voice/audio_recognition.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/audio_recognition.py
🧠 Learnings (1)
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.
Applied to files:
livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| language and len(transcript) > MIN_LANGUAGE_DETECTION_LENGTH | ||
| ): | ||
| self._last_language = language | ||
| normalized_language = "en" if language == "eng" else language |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps move this to the plugin of the 11labs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought as well, but I went with this approach for resilience: any other STT service that returns eng instead of en would be handled automatically. It also opens the way to use a normalizing function later in case other languages have the issue.
That said, I'm happy to move the normalization to the ElevenLabs plugin if you think that's the more appropriate place for it.
This PR fixes an issue where end-of-turn (EOT) detection fails when using ElevenLabs' Scribe V2 STT model with the
MultilingualModelturn detector.Problem
When running an agent with the
scribe_v2model and turn detection enabled viaMultilingualModel, EOT detection does not work. The turn detector rejects the language code returned by the STT service:Root Cause
The EOT model expects the ISO 639-1 language code
en, but ElevenLabs Scribe V2 returns the ISO 639-3 codeeng(see ElevenLabs API reference).Solution
Normalize the language code in
livekit-agents/livekit/agents/voice/audio_recognition.pywhen settingself._last_language.Why this approach?
Handling normalization at the
audio_recognition.pylevel (rather than in the ElevenLabs plugin) ensures consistent behavior across all STT providers that may return non-standard language codes.Of course any recommendations are welcome and I'll be glad to improve this proposal.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.