Skip to content

Audio Package

The audio package handles audio capture and post-processing, including professional-grade normalization via Jivetalking.

Package Location

internal/audio/

Responsibility

  • Capture audio from microphone
  • Process and normalize audio levels
  • Adaptive noise removal and compression
  • Detect audio devices
  • Platform-specific audio handling

Key Files

File Purpose
audio.go Core audio logic and processing
audio_linux.go Linux-specific (PipeWire)
audio_darwin.go macOS-specific (CoreAudio)
audio_windows.go Windows-specific (WASAPI)

Key Types

Processor

Manages audio post-processing:

type Processor struct {
    options models.AudioProcessingOptions
}

AudioProcessingOptions

Configuration for audio processing:

type AudioProcessingOptions struct {
    NormalizeEnabled    bool    // Enable full processing pipeline
    TargetLoudness      float64 // Target LUFS (-18 for podcast standard)
    TruePeak            float64 // Max true peak in dBTP (-1.5 standard)
    LoudnessRange       float64 // Target LRA in LU (11 preserves dynamics)
    NoiseRemovalEnabled bool    // Enable adaptive noise removal
    CompressionEnabled  bool    // Enable LA-2A style compression
    DeesserEnabled      bool    // Enable adaptive de-essing
}

AudioRecorder

Manages audio capture:

type Recorder struct {
    cmd        *exec.Cmd
    outputFile string
    device     string
}

Audio Processing

Jivetalking Integration

When Jivetalking is installed, Kartoza Screencaster uses its professional 4-pass audio processing pipeline:

┌─────────────────────────────────────────────────────────────┐
│                    Jivetalking Pipeline                      │
├─────────────────────────────────────────────────────────────┤
│ Pass 1: Analysis                                             │
│   • Loudness measurement (LUFS, LRA, true peak)             │
│   • Noise floor detection                                    │
│   • Speech characteristics analysis                          │
│   • Spectral analysis                                        │
├─────────────────────────────────────────────────────────────┤
│ Pass 2: Adaptive Processing                                  │
│   • DS201-style high-pass filter (removes rumble)           │
│   • DS201-style low-pass filter (removes ultrasonic noise)  │
│   • anlmdn + compand noise removal                          │
│   • DS201-style soft gate (inter-speech cleanup)            │
│   • LA-2A optical compression (evens dynamics)              │
│   • Adaptive de-esser (reduces sibilance)                   │
├─────────────────────────────────────────────────────────────┤
│ Pass 3: Loudnorm Measurement                                 │
│   • EBU R128 loudness measurement for calibration           │
├─────────────────────────────────────────────────────────────┤
│ Pass 4: Loudnorm Application                                 │
│   • Two-pass linear loudnorm                                 │
│   • Optional pre-limiting (Volumax-inspired)                │
│   • Click/pop repair (adeclick)                             │
│   • Final measurement validation                             │
└─────────────────────────────────────────────────────────────┘

FFmpeg Fallback

When Jivetalking is not installed, the system falls back to basic FFmpeg two-pass loudnorm:

# Pass 1: Analyze
ffmpeg -i input.wav -af loudnorm=I=-18:TP=-1.5:LRA=11:print_format=json -f null -

# Pass 2: Apply with measured values
ffmpeg -i input.wav -af loudnorm=I=-18:TP=-1.5:LRA=11:measured_I=...:linear=true output.wav

Core Functions

Process

Process audio with full pipeline:

func (p *Processor) Process(inputFile string, progressCallback ProgressCallback) (string, error)

Returns the path to the processed audio file.

Start (Recorder)

Begin audio capture:

func (r *Recorder) Start(outputFile string) error

Stop (Recorder)

End audio capture:

func (r *Recorder) Stop() error

Platform Implementation

Linux (PipeWire)

//go:build linux

func NewRecorder(device string) *Recorder {
    return &Recorder{device: device}
}

func (r *Recorder) Start(output string) error {
    r.cmd = exec.Command("pw-record",
        "--target", r.device,
        "--format", "s16",
        "--rate", "48000",
        "--channels", "1",
        output)
    return r.cmd.Start()
}

macOS (CoreAudio)

//go:build darwin

func (r *Recorder) Start(output string) error {
    r.cmd = exec.Command("ffmpeg",
        "-f", "avfoundation",
        "-i", ":"+r.device,
        "-c:a", "pcm_s16le",
        output)
    return r.cmd.Start()
}

Windows (WASAPI)

//go:build windows

func (r *Recorder) Start(output string) error {
    r.cmd = exec.Command("ffmpeg",
        "-f", "dshow",
        "-i", "audio="+r.device,
        "-c:a", "pcm_s16le",
        output)
    return r.cmd.Start()
}

Processing Options

Default Configuration

func DefaultAudioProcessingOptions() AudioProcessingOptions {
    return AudioProcessingOptions{
        NormalizeEnabled:    true,
        TargetLoudness:      -18.0, // Podcast standard (EBU R128)
        TruePeak:            -1.5,  // Prevents clipping
        LoudnessRange:       11.0,  // Preserves dynamic range
        NoiseRemovalEnabled: true,  // Enable noise removal
        CompressionEnabled:  true,  // Enable compression
        DeesserEnabled:      true,  // Enable de-esser
    }
}

Loudness Targets

Target Value Standard
Integrated Loudness -18 LUFS EBU R128 / Podcast
True Peak -1.5 dBTP Prevents inter-sample peaks
Loudness Range 11 LU Preserves natural dynamics

Error Handling

Fallback Strategy

func (p *Processor) Process(inputFile string, callback ProgressCallback) (string, error) {
    // Try Jivetalking first
    jivetalking, err := exec.LookPath("jivetalking")
    if err != nil {
        // Fall back to FFmpeg-based normalization
        return p.processFallback(inputFile, callback)
    }

    // Use Jivetalking...
}

Device Detection

func detectAudioDevices() ([]string, error) {
    // Linux: pw-cli list-objects
    // macOS: ffmpeg -f avfoundation -list_devices true
    // Windows: ffmpeg -f dshow -list_devices true
}

Usage Example

// Create processor with options
opts := models.DefaultAudioProcessingOptions()
processor := audio.NewProcessor(opts)

// Process with progress callback
outputPath, err := processor.Process("/tmp/audio.wav", func(pass int, name string, progress float64) {
    fmt.Printf("Pass %d (%s): %.0f%%\n", pass, name, progress*100)
})

if err != nil {
    log.Printf("Processing failed: %v", err)
}

fmt.Printf("Processed audio: %s\n", outputPath)

Configuration

Sample Rates

Quality Rate Use Case
Standard 44100 Hz General use
Professional 48000 Hz Video production

Channels

  • Mono - Single channel (recommended for speech)
  • Stereo - Two channels (music/ambience)
  • recorder - Uses audio for capture
  • merger - Merges processed audio with video
  • models - Contains AudioProcessingOptions

External Dependencies

  • Jivetalking - Professional audio processing (optional, recommended)
  • FFmpeg - Fallback processing and format conversion
  • PipeWire - Audio capture on Linux