Add first-pass Windows Voice Mode by NichUK · Pull Request #120 · openclaw/openclaw-windows-node

NichUK · 2026-03-30T18:29:40Z

Summary

This PR adds the first-pass Windows Voice Mode implementation to the tray app. It's by no means finished, but the first feature-set is working. I apologise for the hugeness... Also there was quite a lot of experimentation and reversion so it's not quite as bad as it looks...

What works now

Talk Mode on Windows (via Windows STT)
direct voice send into the main chat session
assistant reply playback (not yet streamed) - currently supports
- Windows TTS
- Minimax
- Eleven Labs
compact repeater window with:
- transcript/reply display
- pause/resume
- response skip
- settings access
- repeater position/size persistence
tray icon state reflecting real listening readiness
configurable provider catalog for TTS/STT rather than hard-coded
default device use

What didn't work

I tried to fully integrate with the WebChat UI, but couldn't achieve it without nasty local DOM-writes, which is very hacky. Also the Windows STT (Windows.Media.SpeechRecognizer) works pretty well, but it has to have control of the entire pipeline, and we can't select an input device without changing the default devices.

Coming Next

Voice Wake implementation (WakeWord)
Push To Talk implementation
true streaming first-chunk TTS playback
true streaming STT using an AudioGraph pipeline
- via cloud providers (OpenAI Whisper/Eleven Labs)
- via local model (hosted in sherpa-onnx)
selected non-default microphone/speaker support for actual STT capture across all providers
voice control record parsing
central pronunciation dictionary support

Notes

I kept the architecture intentionally close to the existing tray/node model and documented the current and planned states in docs/VOICE-MODE.md as well as the architecture. Also made as few touch points to the existing app as possible to minimise change risk,

Happy to receive notes/change requests before merging, etc., and attempt to deal with issues if anyone actually uses it! :)

Copilot

Pull request overview

Adds a first-pass Windows “Voice Mode” feature set to the WinUI tray app, introducing voice runtime/configuration plumbing, UI surfaces (status + repeater), and provider-catalog driven STT/TTS support, plus associated schema/capability work in OpenClaw.Shared.

Changes:

Add WinUI voice UI: settings panel integration, voice status window, and repeater window (with persisted placement/options).
Introduce provider catalog/config store + cloud TTS client scaffolding; add Windows.Media STT route and capture service.
Extend shared schema/capability surface for voice commands and enhance gateway chat handling for session-key normalization + preview-based final assistant message recovery.

Reviewed changes

Copilot reviewed 50 out of 52 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs	Adds tests for WebChat DOM voice bridge scripting (currently mismatched vs implementation).
tests/OpenClaw.Tray.Tests/VoiceServiceTransportTests.cs	Adds unit tests covering internal voice transport/decision helpers.
tests/OpenClaw.Tray.Tests/VoiceProviderCatalogServiceTests.cs	Adds tests for voice provider catalog loading + icon generation.
tests/OpenClaw.Tray.Tests/VoiceCloudTextToSpeechClientTests.cs	Adds tests for cloud TTS client cancellation + decoding helpers.
tests/OpenClaw.Tray.Tests/VoiceChatCoordinatorTests.cs	Adds tests for coordinating draft/turn mirroring across attached windows.
tests/OpenClaw.Tray.Tests/SettingsRoundTripTests.cs	Extends settings round-trip/back-compat tests for voice settings + provider config migration.
tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj	Updates test TFM to Windows and references WinUI project.
tests/OpenClaw.Shared.Tests/VoiceModeSchemaTests.cs	Adds schema/enum/default tests for the shared voice model.
tests/OpenClaw.Shared.Tests/OpenClawGatewayClientTests.cs	Adds tests for new gateway chat session normalization + preview handling.
tests/OpenClaw.Shared.Tests/CapabilityTests.cs	Adds VoiceCapability behavior tests.
src/OpenClaw.Tray.WinUI/Windows/WebChatWindow.xaml.cs	Implements `IVoiceChatWindow` draft injection into WebChat via WebView2 script.
src/OpenClaw.Tray.WinUI/Windows/WebChatVoiceDomState.cs	Adds minimal state container for pending draft mirroring into WebChat DOM.
src/OpenClaw.Tray.WinUI/Windows/WebChatVoiceDomBridge.cs	Adds injected DOM bridge script + helper to build draft-setting JS.
src/OpenClaw.Tray.WinUI/Windows/VoiceRepeaterWindow.xaml.cs	Adds compact repeater window for transcript/replies + controls + persistence.
src/OpenClaw.Tray.WinUI/Windows/VoiceRepeaterWindow.xaml	Adds repeater window layout and settings flyout UI.
src/OpenClaw.Tray.WinUI/Windows/VoiceModeWindow.xaml.cs	Adds “Voice Mode” status/config summary window.
src/OpenClaw.Tray.WinUI/Windows/VoiceModeWindow.xaml	Adds layout for the voice status/config window.
src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml.cs	Integrates `VoiceSettingsPanel` and makes save flow async to apply voice config.
src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml	Adds `VoiceSettingsPanel` control to Settings UI.
src/OpenClaw.Tray.WinUI/Services/Voice/WindowsMediaSpeechToTextRoute.cs	Adds Windows.Media dictation recognizer route.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteResources.cs	Adds resource container for STT route assets (recognizer/capture).
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteKind.cs	Adds route-kind enum for selecting STT pipeline.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteFactory.cs	Adds factory to select STT route kind based on provider/runtime.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceProviderCatalogService.cs	Adds catalog loader/normalizer and provider runtime support checks.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceDisplayHelper.cs	Adds UI-friendly labels for voice mode/state/runtime.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCloudTextToSpeechClient.cs	Adds HTTP/WebSocket cloud TTS client using provider contracts + templating.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceChatCoordinator.cs	Adds dispatcher-based coordination of draft/turn updates across windows.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceChatContracts.cs	Adds interfaces for voice runtime/config/control + window mirroring abstractions.
src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCaptureService.cs	Adds AudioGraph-based capture service with peak/signal helpers.
src/OpenClaw.Tray.WinUI/Services/Voice/SherpaOnnxSpeechToTextRoute.cs	Adds scaffold route for sherpa-onnx STT (not implemented).
src/OpenClaw.Tray.WinUI/Services/Voice/IVoiceSpeechToTextRoute.cs	Adds STT route interface.
src/OpenClaw.Tray.WinUI/Services/Voice/AudioGraphStreamingSpeechToTextRoute.cs	Adds scaffold route for streaming STT (not implemented).
src/OpenClaw.Tray.WinUI/Services/SettingsManager.cs	Persists voice settings, repeater window prefs, and provider configuration store.
src/OpenClaw.Tray.WinUI/Services/NodeService.cs	Wires VoiceCapability into node and starts/stops voice with node connect lifecycle.
src/OpenClaw.Tray.WinUI/Services/GlobalHotkeyService.cs	Adds second global hotkey (Ctrl+Alt+Shift+V) for voice pause/resume.
src/OpenClaw.Tray.WinUI/Properties/AssemblyInfo.cs	Adds InternalsVisibleTo for OpenClaw.Tray.Tests.
src/OpenClaw.Tray.WinUI/Helpers/IconHelper.cs	Adds voice tray icon state variants and generated icon caching.
src/OpenClaw.Tray.WinUI/Controls/VoiceSettingsPanel.xaml.cs	Adds voice settings UI logic (mode/providers/devices/provider settings draft/apply).
src/OpenClaw.Tray.WinUI/Controls/VoiceSettingsPanel.xaml	Adds voice settings UI layout and provider settings editor controls.
src/OpenClaw.Tray.WinUI/Assets/voice-providers.json	Adds provider catalog describing STT/TTS options and contracts.
src/OpenClaw.Tray.WinUI/Assets/voice-mode-feature.png	Adds README feature icon asset for Voice Mode.
src/OpenClaw.Shared/VoiceProviderConfigurationStoreExtensions.cs	Adds config-store helpers + clone + legacy credential migration.
src/OpenClaw.Shared/VoiceModeSchema.cs	Adds shared voice command/schema models + JSON converters and provider contract types.
src/OpenClaw.Shared/SettingsData.cs	Adds voice settings + repeater prefs + provider config + legacy JSON migration shim.
src/OpenClaw.Shared/OpenClawGatewayClient.cs	Extends gateway client for chat session defaults/normalization and preview-based final assistant message capture.
src/OpenClaw.Shared/Models.cs	Adds `ChatMessageEventArgs` event payload type.
src/OpenClaw.Shared/Capabilities/VoiceCapability.cs	Adds node capability implementation for voice commands.
README.md	Documents Voice Mode feature and adds to feature list/parity table.
.gitignore	Ignores `.env` and repo-local tool/workspace cache directories.

Comments suppressed due to low confidence (2)

tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs:28

This test uses WebChatWindow.BuildTurnsScript(...) and asserts for setTurns/direction/text serialization, but there is no BuildTurnsScript (and the injected setTurns() in WebChatVoiceDomBridge.DocumentCreatedScript currently just clears a legacy host). Either implement turns rendering/serialization or update/remove this test so it matches the current DOM bridge behavior.
tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs:35
This assertion checks WebChatWindow.TrayVoiceIntegrationScript for DOM anchor logic, but no such script exists in the current implementation (the injected script is WebChatVoiceDomBridge.DocumentCreatedScript and it does not contain getTurnsAnchor/insertBefore). Update the test to assert against the actual injected script, or add the missing integration script if it's still intended.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs

tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj

…lient.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ovider route kinds Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/f2ae3d04-4f08-49c2-8095-9e801a4ccf6d Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com>

…ws-node into feature/voice-mode

…aming provider route kinds" Reverts CoPilot fix This reverts commit 78d0a3d.

Move voice-mode test-targeted logic out of the WinUI app and into a dedicated shared project so tray tests no longer need to reference OpenClaw.Tray.WinUI directly. This restores the original CI assumption that the tray test project can be built on its own without transitively building a Windows App SDK application with an implicit architecture. It also keeps the voice/chat extraction scoped away from the broader OpenClaw.Shared library, which remains general-purpose and non-tray-specific. The new OpenClaw.Tray.Shared project now contains the shared voice/chat surface used by both the tray app and tray tests, including voice transport helpers, provider catalog loading, cloud TTS support, chat coordination, and the web chat DOM bridge. The WinUI app retains the UI shell pieces, including DispatcherQueueAdapter and the app-level icon path helper. As a follow-up cleanup during the extraction, split the previous IconHelper into AppIconHelper in the WinUI project and VoiceTrayIconHelper in the shared tray project so the new shared library stays focused on voice-related behavior rather than wider tray infrastructure.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Refactor tray voice code into OpenClaw.Tray.Shared

NichUK · 2026-04-03T15:51:02Z

@shanselman, @copilot

I introduced a new OpenClaw.Tray.Shared project to separate tray voice/chat logic that needs unit coverage from the WinUI app shell.

Before this change, the tray test project had to reference OpenClaw.Tray.WinUI directly in order to test the new voice-mode code. That made the existing CI test job transitively build the WinUI app, which is what caused the ARM64 architecture-specific Windows App SDK failure. Moving the shared voice/chat logic into its own project let the tests exercise that code without having to build the WinUI application itself.

I kept OpenClaw.Shared unchanged because it is the broader repo-wide shared library (and built against net10), while this extracted code is tray-specific and still Windows-oriented (so net10-windows). The new project is intentionally narrower: it holds the tray voice/chat logic shared by the WinUI app and tray tests, while WinUI-only pieces such as dispatcher plumbing and app-level icon handling remain in OpenClaw.Tray.WinUI.

If this doesn't work for you, and you'd prefer that stuff to go into OpenClaw.Shared, and change the TargetFramework instead, then let me know and I'll refactor.

I'm also addressing the points raised by Repo Assist above. This stuff is f**kin' magic! :)

Cover the pure shared logic in VoiceProviderConfigurationStoreExtensions with focused unit tests for case-insensitive provider lookup, case-insensitive setting lookup, SetValue creation/update behavior, and removal of blank or null values.

NichUK · 2026-04-04T17:28:11Z

Increased test coverage as suggested

Add tests for voice provider configuration helpers

github-actions · 2026-04-05T01:18:40Z

🤖 This is an automated response from Repo Assist.

@NichUK — this is a really thorough response to the test coverage feedback! Looking at the new additions:

What landed:

VoiceCapabilityTests (8 tests) — covers all 6 voice commands (list-devices, get-settings, set-settings, get-status, start, stop) including error paths and missing handler cases
VoiceModeSchemaTests — covers VoiceCommands.All, enum values, and schema contracts
VoiceProviderConfigurationStoreExtensionsTests — exactly what was suggested: SetValue/RemoveValue/case-insensitive key lookups
VoiceChatCoordinatorTests, VoiceCloudTextToSpeechClientTests, VoiceProviderCatalogServiceTests, VoiceServiceTransportTests — impressive; these cover the new OpenClaw.Tray.Shared project
WebChatWindowDomBridgeTests and SettingsRoundTripTests extensions — solid rounding out

Overall: 61 new [Fact] methods across 9 test files — this is exactly the kind of test coverage that makes a feature safe to merge. The PR is in excellent shape from a testing standpoint.

The one remaining open question for the maintainer is the VoiceService.cs size (~2 500 lines), but as noted in the previous review, that's a follow-up concern, not a blocker. The architecture is sound and the tests document the behaviour thoroughly.

Looking forward to seeing this land! 👍

Generated by Repo Assist · ◷

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@cbb46ab386962aa371045839fc9998ee4e97ca64

NichUK added 30 commits March 23, 2026 01:34

Add Windows voice mode foundation and AlwaysOn runtime

be624fe

Fix voice chat transport and reply routing

f40ffc3

Add configurable voice mode settings and setup UI

a81d31e

Integrate always-on voice mode with tray chat workflow

197a89b

Fix tray voice startup and chat window submission

1340bde

Remove stale always-on autosubmit setting

aed8cb8

Add focused coordinator coverage for tray voice chat

25dd06b

Address voice mode review findings and harden runtime

1336472

Document required Minimax and ElevenLabs provider support

0f1028a

Harden tray chat voice message handling

2c8a46d

Fix voice transport connection task reuse

fdbf48e

Group voice runtime services under Services/Voice

b556c64

Implement MiniMax TTS for voice mode

7f31c12

Add editable TTS provider settings to voice mode

c64f168

Move voice settings into main settings window

907a1a0

Extract hosted voice settings panel from settings window

6dba89b

Generalize cloud TTS providers through catalog contracts

ded41a2

Rename voice modes to VoiceWake and TalkMode

199e534

Move voice settings below node mode toggle

47efc3e

Make cloud TTS voice settings fully catalog-driven

85d7b90

Ship voice provider catalog with the tray app

c1cc0ff

Instrument voice output latency and reduce TTS buffering

83f05ee

Tighten talk mode speech recognition filtering

d137409

Use MiniMax api-uw endpoint for lower TTS latency

05d7bae

Add catalog-driven MiniMax WebSocket TTS

5efcebf

Fix voice restart after settings save

45ff8f8

Fix MiniMax websocket voice playback routing

71d0de4

Add dynamic tray icons for voice states

91ccec3

Add pre-response voice latency timing logs

2ff57fc

Keep talk mode alive after input failures

ffa3fa2

NichUK added 3 commits March 30, 2026 18:16

Tweak voice mode README credit wording

4756563

Merge branch 'codex/webchat-direct-send-restore' into feature/voice-mode

c9dc8e8

Merge remote feature branch into feature/voice-mode

fc66745

NichUK marked this pull request as draft March 30, 2026 18:30

NichUK marked this pull request as ready for review March 30, 2026 19:16

This was referenced Mar 31, 2026

[Repo Assist] Monthly Activity 2026-03 #35

Closed

[Repo Assist] Monthly Activity 2026-04 #127

Open

shanselman requested a review from Copilot April 1, 2026 06:49

Copilot started reviewing on behalf of shanselman April 1, 2026 06:50 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs Outdated Show resolved Hide resolved

tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj Outdated Show resolved Hide resolved

NichUK and others added 8 commits April 2, 2026 21:19

Merge origin/master into feature/voice-mode

db108f4

Fix tests after master merge

6c9680f

Update src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCloudTextToSpeechC…

777088c

…lient.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix SupportsSpeechToTextRuntime test assertions to match streaming pr…

78d0a3d

…ovider route kinds Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/f2ae3d04-4f08-49c2-8095-9e801a4ccf6d Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com>

Fix incorrect test project binding to x64

3d5b4e7

Merge branch 'feature/voice-mode' of github.com:NichUK/openclaw-windo…

a6592e8

…ws-node into feature/voice-mode

Revert "Fix SupportsSpeechToTextRuntime test assertions to match stre…

83c7983

…aming provider route kinds" Reverts CoPilot fix This reverts commit 78d0a3d.

Remove incorrect project binding to x64

1820901

This comment has been minimized.

Sign in to view

NichUK and others added 5 commits April 3, 2026 16:06

Update src/OpenClaw.Tray.WinUI/App.xaml.cs

fd6c89f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/OpenClaw.Tray.WinUI/App.xaml.cs

4f3c8c6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/OpenClaw.Tray.WinUI/Helpers/AppIconHelper.cs

7e7bb32

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge pull request #5 from NichUK/codex/fix-pr120-ci

88caa1a

Refactor tray voice code into OpenClaw.Tray.Shared

This comment has been minimized.

Sign in to view

Merge pull request #6 from NichUK/codex/fix-pr120-ci

57ae532

Add tests for voice provider configuration helpers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first-pass Windows Voice Mode#120

Add first-pass Windows Voice Mode#120
NichUK wants to merge 104 commits intoopenclaw:masterfrom
NichUK:feature/voice-mode

NichUK commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

NichUK commented Apr 3, 2026

Uh oh!

This comment has been minimized.

NichUK commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NichUK commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What works now

What didn't work

Coming Next

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

NichUK commented Apr 3, 2026

Uh oh!

This comment has been minimized.

NichUK commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NichUK commented Mar 30, 2026 •

edited

Loading