-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat(audio): Add waveform visualization for PTT voice messages #2346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(audio): Add waveform visualization for PTT voice messages #2346
Conversation
Reviewer's GuideImplements WhatsApp PTT audio waveform support by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message payloads, lowering the audio bitrate to match WhatsApp requirements, and adding patch-package infrastructure (including Docker build integration) to prevent Baileys from overwriting custom waveforms. Sequence diagram for sending PTT audio with generated waveformsequenceDiagram
actor Client
participant API as EvolutionAPI
participant Service as BaileysStartupService
participant FFmpeg as FFmpegProcess
participant Decoder as AudioDecoder
participant Baileys as BaileysClient
participant WhatsApp as WhatsAppServer
Client->>API: HTTP request SendAudioDto
API->>Service: audioWhatsapp(data, file, isIntegration)
alt File upload path
Service->>FFmpeg: processAudio(mediaData.audio)
FFmpeg-->>Service: Buffer convertedAudio
Service->>Decoder: getAudioDuration(convertedAudio)
Decoder-->>Service: seconds
Service->>Decoder: getAudioWaveform(convertedAudio)
Decoder-->>Service: waveform Uint8Array
Service->>Baileys: sendMessageWithTyping(number, messageContent, options, isIntegration)
else URL or base64 path
Service->>Service: audioBuffer from URL or base64
alt audioBuffer is Buffer
Service->>Decoder: getAudioDuration(audioBuffer)
Decoder-->>Service: seconds
Service->>Decoder: getAudioWaveform(audioBuffer)
Decoder-->>Service: waveform Uint8Array
end
Service->>Baileys: sendMessageWithTyping(number, message, options, isIntegration)
end
Baileys-->>WhatsApp: PTT message with seconds and waveform
WhatsApp-->>Client: PTT voice note with waveform visualization
Class diagram for BaileysStartupService audio waveform enhancementsclassDiagram
class BaileysStartupService {
- logger
- processAudio(input)
+ audioWhatsapp(data, file, isIntegration) Promise~any~
- getAudioDuration(audioBuffer) Promise~number~
- getAudioWaveform(audioBuffer) Promise~Uint8Array~
+ sendMessageWithTyping[numberType, messageContentType](number, messageContent, options, isIntegration) Promise~any~
}
class AudioDecoderLibrary {
+ audioDecode(audioBuffer) Promise~DecodedAudio~
}
class DecodedAudio {
+ duration number
+ getChannelData(channelIndex) Float32Array
}
BaileysStartupService --> AudioDecoderLibrary : uses
AudioDecoderLibrary --> DecodedAudio : returns
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 2 issues, and left some high level feedback:
- In
getAudioWaveform,samplesPerWaveformcan become 0 for very short audio (Math.floor(samples.length / waveformLength)), which will cause division-by-zero andNaNaverages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs. getAudioDurationandgetAudioWaveformeach callaudioDecodeon the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.- The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for
infolevel; consider downgrading todebugor trimming to avoid log noise in production.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio (`Math.floor(samples.length / waveformLength)`), which will cause division-by-zero and `NaN` averages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs.
- `getAudioDuration` and `getAudioWaveform` each call `audioDecode` on the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.
- The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for `info` level; consider downgrading to `debug` or trimming to avoid log noise in production.
## Individual Comments
### Comment 1
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3042-3051` </location>
<code_context>
}
}
+ private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
+ try {
+ this.logger.info('Getting audio duration...');
+ const audioData = await audioDecode(audioBuffer);
+ const duration = Math.ceil(audioData.duration);
+ this.logger.info(`Audio duration: ${duration} seconds`);
+ return duration;
+ } catch (error) {
+ this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`);
+ return 1;
+ }
+ }
+
+ private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
+ try {
+ this.logger.info('Generating audio waveform...');
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead.
`getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)` and are used sequentially in `audioWhatsapp`, effectively decoding the same buffer twice per message. Consider refactoring to decode once and share the result (e.g., via a helper that returns both duration and waveform, or by passing pre-decoded audio data into one of these methods).
Suggested implementation:
```typescript
.audioFrequency(48000)
.audioChannels(1)
.outputOptions([
}
}
private lastDecodedAudio?: {
buffer: Buffer;
duration: number;
waveform: Uint8Array;
};
private async getOrDecodeAudio(audioBuffer: Buffer): Promise<{
duration: number;
waveform: Uint8Array;
}> {
// Reuse cached analysis when the same buffer instance is passed
if (this.lastDecodedAudio && this.lastDecodedAudio.buffer === audioBuffer) {
return {
duration: this.lastDecodedAudio.duration,
waveform: this.lastDecodedAudio.waveform,
};
}
try {
this.logger.info('Decoding audio buffer for analysis...');
const audioData = await audioDecode(audioBuffer);
// Duration (in seconds, rounded up)
const duration = Math.ceil(audioData.duration);
// Waveform generation (64 buckets, normalized 0-255)
const samples = audioData.getChannelData(0); // Get first channel
const waveformLength = 64;
const samplesPerWaveform = Math.floor(samples.length / waveformLength) || 1;
const rawValues: number[] = [];
for (let i = 0; i < waveformLength; i++) {
const start = i * samplesPerWaveform;
const end = Math.min(start + samplesPerWaveform, samples.length);
let sum = 0;
for (let j = start; j < end; j++) {
sum += Math.abs(samples[j]);
}
const avg = end > start ? sum / (end - start) : 0;
rawValues.push(avg);
}
const max = Math.max(...rawValues, 1); // avoid division by zero
const waveform = new Uint8Array(
rawValues.map((value) => Math.min(255, Math.round((value / max) * 255))),
);
this.lastDecodedAudio = {
buffer: audioBuffer,
duration,
waveform,
};
return { duration, waveform };
} catch (error: any) {
this.logger.warn(
`Failed to analyze audio buffer: ${error?.message ?? error}, using default duration and empty waveform`,
);
const duration = 1;
const waveform = new Uint8Array(64); // all zeros
this.lastDecodedAudio = {
buffer: audioBuffer,
duration,
waveform,
};
return { duration, waveform };
}
}
private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
const { duration } = await this.getOrDecodeAudio(audioBuffer);
this.logger.info(`Audio duration: ${duration} seconds`);
return duration;
}
private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
this.logger.info('Generating audio waveform...');
const { waveform } = await this.getOrDecodeAudio(audioBuffer);
return waveform;
```
This change ensures that when the same `Buffer` instance is passed to `getAudioDuration` and `getAudioWaveform` sequentially (as in `audioWhatsapp`), the audio is decoded only once and the results are cached and reused.
If there are other call sites that construct new `Buffer` instances or copies for each call, and you want to guarantee a single decode per message, you can:
1. Decode the audio at the call site (e.g., in `audioWhatsapp`) using `getOrDecodeAudio` and pass around the resulting `{ duration, waveform }`, or
2. Ensure the same `Buffer` reference is reused for both `getAudioDuration` and `getAudioWaveform`.
</issue_to_address>
### Comment 2
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3093-3096` </location>
<code_context>
+
+ // Log first 10 values for debugging
+ const firstValues = Array.from(waveform.slice(0, 10));
+ this.logger.info(`Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`);
+ this.logger.info(`Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`);
+
</code_context>
<issue_to_address>
**suggestion (performance):** Consider reducing log verbosity or moving detailed waveform logs to a debug level.
Per-audio logging of waveform samples and type details will flood production logs and add overhead. Consider gating this behind a debug flag or lower log level and reserving info logs for high-level success/failure information.
```suggestion
// Log first 10 values for debugging (debug-level to avoid flooding production logs)
const firstValues = Array.from(waveform.slice(0, 10));
this.logger.debug(
`Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`,
);
this.logger.debug(
`Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`,
);
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Add audio-decode library for audio buffer analysis - Implement getAudioDuration() to extract duration from audio - Implement getAudioWaveform() to generate 64-value waveform array - Normalize waveform values to 0-100 range for WhatsApp compatibility - Change audio bitrate from 128k to 48k per WhatsApp PTT requirements - Add Baileys patch to prevent waveform overwrite - Increase Node.js heap size for build to prevent OOM Fixes EvolutionAPI#1086
fac3cff to
cf8f0b3
Compare
Summary
This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.
Changes
audio-decodelibrary to analyze audio buffer and generate a 64-value waveform array representing the audio amplitudepatch-package)Technical Details
getAudioDuration(): Extracts duration in seconds from audio buffergetAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample pointsUint8Arrayfor Baileys compatibilityRelated Issues
Testing
Summary by Sourcery
Add waveform-enabled PTT audio sending for WhatsApp and wire in Baileys patching into the build.
New Features:
Enhancements:
Build:
Chores: