Skip to content

Conversation

@ffigueroa
Copy link

@ffigueroa ffigueroa commented Jan 1, 2026

Summary

This PR adds proper waveform visualization for PTT (Push-to-Talk) voice messages sent via the API. Currently, audio messages sent through Evolution API display without the visual waveform in WhatsApp, making them look less authentic compared to messages sent directly from the app.

Changes

  • Waveform generation: Uses audio-decode library to analyze audio buffer and generate a 64-value waveform array representing the audio amplitude
  • Duration extraction: Automatically extracts audio duration from the buffer
  • Bitrate adjustment: Changed audio bitrate from 128k to 48k as per WhatsApp PTT requirements
  • Baileys patch: Prevents Baileys from overwriting manually-generated waveforms (via patch-package)

Technical Details

  • getAudioDuration(): Extracts duration in seconds from audio buffer
  • getAudioWaveform(): Generates normalized waveform (0-100 range) with 64 sample points
  • Waveform values are properly typed as Uint8Array for Baileys compatibility
  • Includes fallback handling if waveform generation fails

Related Issues

Testing

  • Tested with various audio formats (mp3, ogg, wav)
  • Verified waveform displays correctly in WhatsApp iOS and Android
  • Confirmed backwards compatibility with existing audio sending functionality

Summary by Sourcery

Add waveform-enabled PTT audio sending for WhatsApp and wire in Baileys patching into the build.

New Features:

  • Generate and attach waveform metadata and duration for PTT audio messages sent via the WhatsApp integration.

Enhancements:

  • Adjust WhatsApp PTT audio encoding parameters to use a 48k bitrate compatible with native voice notes.

Build:

  • Include patch files in the Docker image, run patch-package during image build and postinstall, and increase Node memory limit for the build process.

Chores:

  • Introduce patch-package as a development dependency for maintaining local Baileys patches.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 1, 2026

Reviewer's Guide

Implements WhatsApp PTT audio waveform support by decoding audio buffers to derive duration and a 64-point Uint8Array waveform, wiring these into Baileys message payloads, lowering the audio bitrate to match WhatsApp requirements, and adding patch-package infrastructure (including Docker build integration) to prevent Baileys from overwriting custom waveforms.

Sequence diagram for sending PTT audio with generated waveform

sequenceDiagram
  actor Client
  participant API as EvolutionAPI
  participant Service as BaileysStartupService
  participant FFmpeg as FFmpegProcess
  participant Decoder as AudioDecoder
  participant Baileys as BaileysClient
  participant WhatsApp as WhatsAppServer

  Client->>API: HTTP request SendAudioDto
  API->>Service: audioWhatsapp(data, file, isIntegration)
  alt File upload path
    Service->>FFmpeg: processAudio(mediaData.audio)
    FFmpeg-->>Service: Buffer convertedAudio
    Service->>Decoder: getAudioDuration(convertedAudio)
    Decoder-->>Service: seconds
    Service->>Decoder: getAudioWaveform(convertedAudio)
    Decoder-->>Service: waveform Uint8Array
    Service->>Baileys: sendMessageWithTyping(number, messageContent, options, isIntegration)
  else URL or base64 path
    Service->>Service: audioBuffer from URL or base64
    alt audioBuffer is Buffer
      Service->>Decoder: getAudioDuration(audioBuffer)
      Decoder-->>Service: seconds
      Service->>Decoder: getAudioWaveform(audioBuffer)
      Decoder-->>Service: waveform Uint8Array
    end
    Service->>Baileys: sendMessageWithTyping(number, message, options, isIntegration)
  end
  Baileys-->>WhatsApp: PTT message with seconds and waveform
  WhatsApp-->>Client: PTT voice note with waveform visualization
Loading

Class diagram for BaileysStartupService audio waveform enhancements

classDiagram
  class BaileysStartupService {
    - logger
    - processAudio(input)
    + audioWhatsapp(data, file, isIntegration) Promise~any~
    - getAudioDuration(audioBuffer) Promise~number~
    - getAudioWaveform(audioBuffer) Promise~Uint8Array~
    + sendMessageWithTyping[numberType, messageContentType](number, messageContent, options, isIntegration) Promise~any~
  }

  class AudioDecoderLibrary {
    + audioDecode(audioBuffer) Promise~DecodedAudio~
  }

  class DecodedAudio {
    + duration number
    + getChannelData(channelIndex) Float32Array
  }

  BaileysStartupService --> AudioDecoderLibrary : uses
  AudioDecoderLibrary --> DecodedAudio : returns
Loading

File-Level Changes

Change Details Files
Generate audio duration and 64-sample Uint8Array waveform from audio buffers for use in WhatsApp PTT messages.
  • Added private helper getAudioDuration() that decodes the audio buffer with audioDecode, returns a ceil()ed duration in seconds, and falls back to 1 second with logging on failure.
  • Added private helper getAudioWaveform() that decodes the buffer, samples the first channel into 64 average-amplitude buckets, normalizes them into a 0–100 range as Uint8Array, and falls back to a flat waveform on failure, with extensive debug logging.
  • Integrated duration and waveform generation into audioWhatsapp() for both processed audio buffers and base64 inputs, conditionally skipping waveform generation for URL-based audio and adding seconds/waveform fields to the message payload only when available.
src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts
Adjust audio encoding parameters for WhatsApp PTT compatibility and message authenticity.
  • Lowered ffmpeg audio bitrate from 128k to 48k in processAudio pipeline while keeping libopus, 48kHz, and mono settings unchanged.
src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts
Add patch-package infrastructure and ensure patches are applied in both local installs and Docker builds.
  • Copied patches directory into the Docker image, run patch-package during Docker build, and constrained the Node build process memory via NODE_OPTIONS.
  • Added a postinstall script to run patch-package automatically and added patch-package as a devDependency.
  • Introduced a Baileys patch file to prevent Baileys from overwriting manually generated waveform data.
Dockerfile
package.json
package-lock.json
patches/baileys+7.0.0-rc.6.patch

Assessment against linked issues

Issue Objective Addressed Explanation
#1086 Restore/display the WhatsApp-style audio waveform visualization for PTT/voice notes sent via the Evolution API so they look like native voice notes again.
#1086 Ensure technical compatibility of API-sent voice notes with WhatsApp PTT requirements so that the waveform is accepted and rendered by WhatsApp clients.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In getAudioWaveform, samplesPerWaveform can become 0 for very short audio (Math.floor(samples.length / waveformLength)), which will cause division-by-zero and NaN averages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs.
  • getAudioDuration and getAudioWaveform each call audioDecode on the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.
  • The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for info level; consider downgrading to debug or trimming to avoid log noise in production.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `getAudioWaveform`, `samplesPerWaveform` can become 0 for very short audio (`Math.floor(samples.length / waveformLength)`), which will cause division-by-zero and `NaN` averages; consider guarding with a minimum of 1 or short-circuiting to a default waveform for very small inputs.
- `getAudioDuration` and `getAudioWaveform` each call `audioDecode` on the same buffer and are invoked back-to-back in multiple paths; consider decoding once and passing the decoded audio data into both routines to avoid repeated heavy work on large audio files.
- The new logging around waveform generation (including printing the first 10 values and type info in both the helper and caller) is quite verbose for `info` level; consider downgrading to `debug` or trimming to avoid log noise in production.

## Individual Comments

### Comment 1
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3042-3051` </location>
<code_context>
     }
   }

+  private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
+    try {
+      this.logger.info('Getting audio duration...');
+      const audioData = await audioDecode(audioBuffer);
+      const duration = Math.ceil(audioData.duration);
+      this.logger.info(`Audio duration: ${duration} seconds`);
+      return duration;
+    } catch (error) {
+      this.logger.warn(`Failed to get audio duration: ${error.message}, using default 1 second`);
+      return 1;
+    }
+  }
+
+  private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
+    try {
+      this.logger.info('Generating audio waveform...');
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid decoding the same audio buffer twice for duration and waveform to reduce overhead.

`getAudioDuration` and `getAudioWaveform` each call `audioDecode(audioBuffer)` and are used sequentially in `audioWhatsapp`, effectively decoding the same buffer twice per message. Consider refactoring to decode once and share the result (e.g., via a helper that returns both duration and waveform, or by passing pre-decoded audio data into one of these methods).

Suggested implementation:

```typescript
          .audioFrequency(48000)
          .audioChannels(1)
          .outputOptions([
    }
  }

  private lastDecodedAudio?: {
    buffer: Buffer;
    duration: number;
    waveform: Uint8Array;
  };

  private async getOrDecodeAudio(audioBuffer: Buffer): Promise<{
    duration: number;
    waveform: Uint8Array;
  }> {
    // Reuse cached analysis when the same buffer instance is passed
    if (this.lastDecodedAudio && this.lastDecodedAudio.buffer === audioBuffer) {
      return {
        duration: this.lastDecodedAudio.duration,
        waveform: this.lastDecodedAudio.waveform,
      };
    }

    try {
      this.logger.info('Decoding audio buffer for analysis...');
      const audioData = await audioDecode(audioBuffer);

      // Duration (in seconds, rounded up)
      const duration = Math.ceil(audioData.duration);

      // Waveform generation (64 buckets, normalized 0-255)
      const samples = audioData.getChannelData(0); // Get first channel
      const waveformLength = 64;
      const samplesPerWaveform = Math.floor(samples.length / waveformLength) || 1;

      const rawValues: number[] = [];
      for (let i = 0; i < waveformLength; i++) {
        const start = i * samplesPerWaveform;
        const end = Math.min(start + samplesPerWaveform, samples.length);

        let sum = 0;
        for (let j = start; j < end; j++) {
          sum += Math.abs(samples[j]);
        }

        const avg = end > start ? sum / (end - start) : 0;
        rawValues.push(avg);
      }

      const max = Math.max(...rawValues, 1); // avoid division by zero
      const waveform = new Uint8Array(
        rawValues.map((value) => Math.min(255, Math.round((value / max) * 255))),
      );

      this.lastDecodedAudio = {
        buffer: audioBuffer,
        duration,
        waveform,
      };

      return { duration, waveform };
    } catch (error: any) {
      this.logger.warn(
        `Failed to analyze audio buffer: ${error?.message ?? error}, using default duration and empty waveform`,
      );

      const duration = 1;
      const waveform = new Uint8Array(64); // all zeros

      this.lastDecodedAudio = {
        buffer: audioBuffer,
        duration,
        waveform,
      };

      return { duration, waveform };
    }
  }

  private async getAudioDuration(audioBuffer: Buffer): Promise<number> {
    const { duration } = await this.getOrDecodeAudio(audioBuffer);
    this.logger.info(`Audio duration: ${duration} seconds`);
    return duration;
  }

  private async getAudioWaveform(audioBuffer: Buffer): Promise<Uint8Array> {
    this.logger.info('Generating audio waveform...');
    const { waveform } = await this.getOrDecodeAudio(audioBuffer);
    return waveform;

```

This change ensures that when the same `Buffer` instance is passed to `getAudioDuration` and `getAudioWaveform` sequentially (as in `audioWhatsapp`), the audio is decoded only once and the results are cached and reused.

If there are other call sites that construct new `Buffer` instances or copies for each call, and you want to guarantee a single decode per message, you can:
1. Decode the audio at the call site (e.g., in `audioWhatsapp`) using `getOrDecodeAudio` and pass around the resulting `{ duration, waveform }`, or
2. Ensure the same `Buffer` reference is reused for both `getAudioDuration` and `getAudioWaveform`.
</issue_to_address>

### Comment 2
<location> `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts:3093-3096` </location>
<code_context>
+
+      // Log first 10 values for debugging
+      const firstValues = Array.from(waveform.slice(0, 10));
+      this.logger.info(`Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`);
+      this.logger.info(`Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`);
+
</code_context>

<issue_to_address>
**suggestion (performance):** Consider reducing log verbosity or moving detailed waveform logs to a debug level.

Per-audio logging of waveform samples and type details will flood production logs and add overhead. Consider gating this behind a debug flag or lower log level and reserving info logs for high-level success/failure information.

```suggestion
      // Log first 10 values for debugging (debug-level to avoid flooding production logs)
      const firstValues = Array.from(waveform.slice(0, 10));
      this.logger.debug(
        `Generated waveform with ${waveform.length} values. First 10: [${firstValues.join(', ')}]`,
      );
      this.logger.debug(
        `Waveform type: ${waveform.constructor.name}, isUint8Array: ${waveform instanceof Uint8Array}`,
      );
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- Add audio-decode library for audio buffer analysis
- Implement getAudioDuration() to extract duration from audio
- Implement getAudioWaveform() to generate 64-value waveform array
- Normalize waveform values to 0-100 range for WhatsApp compatibility
- Change audio bitrate from 128k to 48k per WhatsApp PTT requirements
- Add Baileys patch to prevent waveform overwrite
- Increase Node.js heap size for build to prevent OOM

Fixes EvolutionAPI#1086
@ffigueroa ffigueroa force-pushed the feat/audio-waveform-visualization branch from fac3cff to cf8f0b3 Compare January 1, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio waveform missing when sending voice notes via Evolution API, impacting user experience and perception.

1 participant