feat(audio): Add waveform visualization for PTT voice messages #2346

ffigueroa · 2026-01-01T19:34:21Z

Summary

This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.

Changes

Waveform generation: Uses audio-decode library to analyze audio buffer and generate a 64-value waveform array representing the audio amplitude
Duration extraction: Automatically extracts audio duration from the buffer
Bitrate adjustment: Changed audio bitrate from 128k to 48k as per WhatsApp PTT requirements
Baileys patch: Prevents Baileys from overwriting manually-generated waveforms (via patch-package)

Technical Details

getAudioDuration(): Extracts duration in seconds from audio buffer
getAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample points
Waveform values are properly typed as Uint8Array for Baileys compatibility
Includes fallback handling if waveform generation fails

Related Issues

Fixes Audio waveform missing when sending voice notes via Evolution API, impacting user experience and perception. #1086 (Audio waveform missing when sending voice notes via Evolution API)
Related to Baileys issue Voice-note waves missing in 6.7.18 – waveform is dropped WhiskeySockets/Baileys#1587

Testing

Tested with various audio formats (mp3, ogg, wav)
Verified waveform displays correctly in WhatsApp iOS and Android
Confirmed backwards compatibility with existing audio sending functionality

Summary by Sourcery

Add waveform-enabled PTT audio sending for WhatsApp and wire in Baileys patching into the build.

New Features:

Generate and attach waveform metadata and duration for PTT audio messages sent via the WhatsApp integration.

Enhancements:

Adjust WhatsApp PTT audio encoding parameters to use a 48k bitrate compatible with native voice notes.

Build:

Include patch files in the Docker image, run patch-package during image build and postinstall, and increase Node memory limit for the build process.

Chores:

Introduce patch-package as a development dependency for maintaining local Baileys patches.

sourcery-ai · 2026-01-01T19:34:27Z

Reviewer's Guide

Implements WhatsApp PTT audio waveform support by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message payloads, lowering the audio bitrate to match WhatsApp requirements, and adding patch-package infrastructure (including Docker build integration) to prevent Baileys from overwriting custom waveforms.

Sequence diagram for sending PTT audio with generated waveform

sequenceDiagram
  actor Client
  participant API as EvolutionAPI
  participant Service as BaileysStartupService
  participant FFmpeg as FFmpegProcess
  participant Decoder as AudioDecoder
  participant Baileys as BaileysClient
  participant WhatsApp as WhatsAppServer

  Client->>API: HTTP request SendAudioDto
  API->>Service: audioWhatsapp(data, file, isIntegration)
  alt File upload path
    Service->>FFmpeg: processAudio(mediaData.audio)
    FFmpeg-->>Service: Buffer convertedAudio
    Service->>Decoder: getAudioDuration(convertedAudio)
    Decoder-->>Service: seconds
    Service->>Decoder: getAudioWaveform(convertedAudio)
    Decoder-->>Service: waveform Uint8Array
    Service->>Baileys: sendMessageWithTyping(number, messageContent, options, isIntegration)
  else URL or base64 path
    Service->>Service: audioBuffer from URL or base64
    alt audioBuffer is Buffer
      Service->>Decoder: getAudioDuration(audioBuffer)
      Decoder-->>Service: seconds
      Service->>Decoder: getAudioWaveform(audioBuffer)
      Decoder-->>Service: waveform Uint8Array
    end
    Service->>Baileys: sendMessageWithTyping(number, message, options, isIntegration)
  end
  Baileys-->>WhatsApp: PTT message with seconds and waveform
  WhatsApp-->>Client: PTT voice note with waveform visualization

Class diagram for BaileysStartupService audio waveform enhancements

classDiagram
  class BaileysStartupService {
    - logger
    - processAudio(input)
    + audioWhatsapp(data, file, isIntegration) Promise~any~
    - getAudioDuration(audioBuffer) Promise~number~
    - getAudioWaveform(audioBuffer) Promise~Uint8Array~
    + sendMessageWithTyping[numberType, messageContentType](number, messageContent, options, isIntegration) Promise~any~
  }

  class AudioDecoderLibrary {
    + audioDecode(audioBuffer) Promise~DecodedAudio~
  }

  class DecodedAudio {
    + duration number
    + getChannelData(channelIndex) Float32Array
  }

  BaileysStartupService --> AudioDecoderLibrary : uses
  AudioDecoderLibrary --> DecodedAudio : returns

File-Level Changes

Change	Details	Files
Generate audio duration and 64-sample Uint8Array waveform from audio buffers for use in WhatsApp PTT messages.	Added private helper getAudioDuration() that decodes the audio buffer with audioDecode, returns a ceil()ed duration in seconds, and falls back to 1 second with logging on failure. Added private helper getAudioWaveform() that decodes the buffer, samples the first channel into 64 average-amplitude buckets, normalizes them into a 0–100 range as Uint8Array, and falls back to a flat waveform on failure, with extensive debug logging. Integrated duration and waveform generation into audioWhatsapp() for both processed audio buffers and base64 inputs, conditionally skipping waveform generation for URL-based audio and adding seconds/waveform fields to the message payload only when available.	`src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`
Adjust audio encoding parameters for WhatsApp PTT compatibility and message authenticity.	Lowered ffmpeg audio bitrate from 128k to 48k in processAudio pipeline while keeping libopus, 48kHz, and mono settings unchanged.	`src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts`
Add patch-package infrastructure and ensure patches are applied in both local installs and Docker builds.	Copied patches directory into the Docker image, run patch-package during Docker build, and constrained the Node build process memory via NODE_OPTIONS. Added a postinstall script to run patch-package automatically and added patch-package as a devDependency. Introduced a Baileys patch file to prevent Baileys from overwriting manually generated waveform data.	`Dockerfile` `package.json` `package-lock.json` `patches/baileys+7.0.0-rc.6.patch`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#1086	Restore/display the WhatsApp-style audio waveform visualization for PTT/voice notes sent via the Evolution API so they look like native voice notes again.	✅
#1086	Ensure technical compatibility of API-sent voice notes with WhatsApp PTT requirements so that the waveform is accepted and rendered by WhatsApp clients.	✅

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In getAudioWaveform, samplesPerWaveform can become 0 for very short audio (Math.floor(samples.length / waveformLength)), which will cause division-by-zero and NaN averages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs.
getAudioDuration and getAudioWaveform each call audioDecode on the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.
The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for info level; consider downgrading to debug or trimming to avoid log noise in production.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio (`Math.floor(samples.length / waveformLength)`), which will cause division-by-zero and `NaN` averages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs.
- `getAudioDuration` and `getAudioWaveform` each call `audioDecode` on the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.
- The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for `info` level; consider downgrading to `debug` or trimming to avoid log noise in production.

## Individual Comments

### Comment 1
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3042-3051` </location>
<code_context>
     }
   }

+  private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
+    try {
+      this.logger.info('Getting audio duration...');
+      const audioData = await audioDecode(audioBuffer);
+      const duration = Math.ceil(audioData.duration);
+      this.logger.info(`Audio duration: ${duration} seconds`);
+      return duration;
+    } catch (error) {
+      this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`);
+      return 1;
+    }
+  }
+
+  private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
+    try {
+      this.logger.info('Generating audio waveform...');
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead.

`getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)` and are used sequentially in `audioWhatsapp`, effectively decoding the same buffer twice per message. Consider refactoring to decode once and share the result (e.g., via a helper that returns both duration and waveform, or by passing pre-decoded audio data into one of these methods).

Suggested implementation:

```typescript
          .audioFrequency(48000)
          .audioChannels(1)
          .outputOptions([
    }
  }

  private lastDecodedAudio?: {
    buffer: Buffer;
    duration: number;
    waveform: Uint8Array;
  };

  private async getOrDecodeAudio(audioBuffer: Buffer): Promise<{
    duration: number;
    waveform: Uint8Array;
  }> {
    // Reuse cached analysis when the same buffer instance is passed
    if (this.lastDecodedAudio && this.lastDecodedAudio.buffer === audioBuffer) {
      return {
        duration: this.lastDecodedAudio.duration,
        waveform: this.lastDecodedAudio.waveform,
      };
    }

    try {
      this.logger.info('Decoding audio buffer for analysis...');
      const audioData = await audioDecode(audioBuffer);

      // Duration (in seconds, rounded up)
      const duration = Math.ceil(audioData.duration);

      // Waveform generation (64 buckets, normalized 0-255)
      const samples = audioData.getChannelData(0); // Get first channel
      const waveformLength = 64;
      const samplesPerWaveform = Math.floor(samples.length / waveformLength) || 1;

      const rawValues: number[] = [];
      for (let i = 0; i < waveformLength; i++) {
        const start = i * samplesPerWaveform;
        const end = Math.min(start + samplesPerWaveform, samples.length);

        let sum = 0;
        for (let j = start; j < end; j++) {
          sum += Math.abs(samples[j]);
        }

        const avg = end > start ? sum / (end - start) : 0;
        rawValues.push(avg);
      }

      const max = Math.max(...rawValues, 1); // avoid division by zero
      const waveform = new Uint8Array(
        rawValues.map((value) => Math.min(255, Math.round((value / max) * 255))),
      );

      this.lastDecodedAudio = {
        buffer: audioBuffer,
        duration,
        waveform,
      };

      return { duration, waveform };
    } catch (error: any) {
      this.logger.warn(
        `Failed to analyze audio buffer: ${error?.message ?? error}, using default duration and empty waveform`,
      );

      const duration = 1;
      const waveform = new Uint8Array(64); // all zeros

      this.lastDecodedAudio = {
        buffer: audioBuffer,
        duration,
        waveform,
      };

      return { duration, waveform };
    }
  }

  private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
    const { duration } = await this.getOrDecodeAudio(audioBuffer);
    this.logger.info(`Audio duration: ${duration} seconds`);
    return duration;
  }

  private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
    this.logger.info('Generating audio waveform...');
    const { waveform } = await this.getOrDecodeAudio(audioBuffer);
    return waveform;

```

This change ensures that when the same `Buffer` instance is passed to `getAudioDuration` and `getAudioWaveform` sequentially (as in `audioWhatsapp`), the audio is decoded only once and the results are cached and reused.

If there are other call sites that construct new `Buffer` instances or copies for each call, and you want to guarantee a single decode per message, you can:
1. Decode the audio at the call site (e.g., in `audioWhatsapp`) using `getOrDecodeAudio` and pass around the resulting `{ duration, waveform }`, or
2. Ensure the same `Buffer` reference is reused for both `getAudioDuration` and `getAudioWaveform`.
</issue_to_address>

### Comment 2
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3093-3096` </location>
<code_context>
+
+      // Log first 10 values for debugging
+      const firstValues = Array.from(waveform.slice(0, 10));
+      this.logger.info(`Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`);
+      this.logger.info(`Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`);
+
</code_context>

<issue_to_address>
**suggestion (performance):** Consider reducing log verbosity or moving detailed waveform logs to a debug level.

Per-audio logging of waveform samples and type details will flood production logs and add overhead. Consider gating this behind a debug flag or lower log level and reserving info logs for high-level success/failure information.

```suggestion
      // Log first 10 values for debugging (debug-level to avoid flooding production logs)
      const firstValues = Array.from(waveform.slice(0, 10));
      this.logger.debug(
        `Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`,
      );
      this.logger.debug(
        `Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`,
      );
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

- Add audio-decode library for audio buffer analysis - Implement getAudioDuration() to extract duration from audio - Implement getAudioWaveform() to generate 64-value waveform array - Normalize waveform values to 0-100 range for WhatsApp compatibility - Change audio bitrate from 128k to 48k per WhatsApp PTT requirements - Add Baileys patch to prevent waveform overwrite - Increase Node.js heap size for build to prevent OOM Fixes EvolutionAPI#1086

sourcery-ai bot reviewed Jan 1, 2026

View reviewed changes

ffigueroa force-pushed the feat/audio-waveform-visualization branch from fac3cff to cf8f0b3 Compare January 1, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(audio): Add waveform visualization for PTT voice messages #2346

feat(audio): Add waveform visualization for PTT voice messages #2346

ffigueroa commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(audio): Add waveform visualization for PTT voice messages #2346

Are you sure you want to change the base?

feat(audio): Add waveform visualization for PTT voice messages #2346

Conversation

ffigueroa commented Jan 1, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Technical Details

Related Issues

Testing

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for sending PTT audio with generated waveform

Class diagram for BaileysStartupService audio waveform enhancements

File-Level Changes

Assessment against linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ffigueroa commented Jan 1, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 1, 2026 •

edited

Loading