Skip to content

Fonada TTS plugin#5171

Open
Pragyanshu-Fonada wants to merge 17 commits intolivekit:mainfrom
Pragyanshu-Fonada:feature/fonadalabs_livekit_agent
Open

Fonada TTS plugin#5171
Pragyanshu-Fonada wants to merge 17 commits intolivekit:mainfrom
Pragyanshu-Fonada:feature/fonadalabs_livekit_agent

Conversation

@Pragyanshu-Fonada
Copy link

@Pragyanshu-Fonada Pragyanshu-Fonada commented Mar 20, 2026

Summary

Adds a new TTS plugin livekit-plugins-fonadalabs for FonadaLabs API —
a high-quality text-to-speech service specializing in Indian languages.

Features

  • WebSocket-based streaming TTS
  • Dynamic language/voice catalog fetched
  • Supports Hindi (70 voices), Tamil (16 voices), Telugu (60 voices), English (70 voices)
  • Language can be specified by code ("hi") or display name ("Hindi")
  • Graceful fallback if catalog is unavailable

Environment Variable

  • FONADALABS_API_KEY — FonadaLabs API key

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

"voice_id": self._resolved_voice,
"language": self._resolved_lang_name, # display name, e.g. "Hindi"
}
await ws.send_str(json.dumps(payload))
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing _mark_started() call prevents TTS metrics from ever being emitted

The fonadalabs SynthesizeStream._run() never calls self._mark_started(), which is required by the base class SynthesizeStream to record the time when TTS synthesis begins. Every other SynthesizeStream plugin in the codebase (Cartesia, ElevenLabs, Deepgram, xai, Google, asyncai, Gradium, Murf, Sarvam, Minimax, Nvidia, Resemble, UpliftAI, FishAudio, Neuphonic, Telnyx — 16 total) calls self._mark_started() before or when sending data to the TTS service.

Without this call, self._started_time remains 0. In the base class _metrics_monitor_task at livekit-agents/livekit/agents/tts/tts.py:539, the _emit_metrics() function checks if not self._started_time — since not 0 is True, it returns early without emitting any metrics. This means no TTSMetrics are ever collected or emitted via the "metrics_collected" event for the fonadalabs plugin, breaking telemetry and monitoring.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

if not segments:
raise ValueError("No text received from input channel.")

text = " ".join(segments)
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 " ".join(segments) introduces spurious spaces between LLM tokens

LLM tokens pushed via push_text() already contain their own whitespace (e.g. "Hello", " world", "!"). The base class itself concatenates them without spaces at livekit-agents/livekit/agents/tts/tts.py:593 (self._pushed_text += token). Using " ".join(segments) on line 288 inserts an extra space between every token, producing text like "Hello world !" instead of "Hello world!". This corrupts the text sent to the TTS API, degrading speech quality. Should use "".join(segments) instead.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration[bot]

This comment was marked as resolved.

@Pragyanshu-Fonada Pragyanshu-Fonada changed the title Feature/fonadalabs livekit agent Fonada TTS plugin Mar 20, 2026
Copy link
Member

@tinalenguyen tinalenguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, thanks for the PR! few notes, could you:



@dataclass
class _Catalog:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we create a file models.py and add a list of voices and languages there? it helps for users to see available ones, and TTS could accept FonadaVoices | str

example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @tinalenguyen we can do that ,thanks for suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants