Why Bats? A Scuba Diver's Path to Bioacoustics

Why Bats? A Scuba Diver's Path to Bioacoustics

Bat habitat near Da Nang, Vietnam

Near Da Nang, Vietnam — where this project began

Bats and whales have almost nothing in common. One flies, one swims. One weighs grams, the other tons. Yet both have evolved the same extraordinary ability: echolocation — navigating and hunting by sound in complete darkness.

As a technical diver, I've spent time at depths where light doesn't reach. Past 50 meters, it gets dark. Past 300 meters, it's pitch black. Sperm whales dive over a thousand meters hunting giant squid — they need sonar to survive.

So why am I building bat detectors instead of whale call analyzers?

Accessibility

Capturing whale songs requires specialized vessels, hydrophones, and access to deep ocean. Bats? All I need is an ultrasonic microphone and a cave. Caves in Southeast Asia are everywhere — and ultrasonic mics cost a few hundred dollars.

This project started in Da Nang, Vietnam, along the coast of the South China Sea. The species here are different from British or American bats, but the fundamentals of bat communication are universal.

The Goal

I'm building an open-source platform with these features:

  • Real-time monitoring of bat vocalizations
  • Real-time classification of call types
  • Post-recording analysis for research
  • Species identification using ML
  • Tech stack: Python with Librosa, NumPy, Pandas, PyTorch. Runs on a Raspberry Pi 4 with 4GB RAM.


    Sound Signal Primer

    Wavelength, Amplitude, Frequency

    The wavelength λ is the length of one cycle — the distance between two peaks.

    The amplitude (sound pressure) A relates to perceived loudness. The power of a signal:

    $P = A^2$

    The frequency F is the rate of oscillation per second. Human hearing ranges from 20Hz to 20kHz (most people max out around 16-17kHz).

    Sampling Rate and Nyquist

    The sampling rate $S_r$ is the number of samples taken per second. Higher $S_r$ = better quality, more information preserved.

    The Nyquist theorem states that to accurately sample a signal, $S_r$ must be at least 2x the highest frequency:

    $S_r = max(F) \times 2$

    $F_n = \frac{S_r}{2}$

    In practice, allow more headroom:

    $S_r = F_n \times 2.5$

    For bat calls at 80kHz:

    $80 \times 2.5 = 200kHz$

    Bit Depth and File Size

    Bit depth Q determines the range of values each sample can take. Common values: 16-bit or 32-bit.

    File size calculation:

    $size(bytes) = \frac{S_r \times Q \times C \times T}{8}$

    Where C is channels and T is duration in seconds.

    The Sine Wave

    Any real audio is a product of many sine waves combined. The formula:

    $A \times sin(2\pi ft + \phi)$

    Where A is amplitude, f is frequency, t is time, and φ is phase.

    Quantization

    Quantization maps analog values to discrete levels. With Q=8 bits, you get $2^8 = 256$ levels to track the wave movement.

    Doppler Shift

    When source and listener move relative to each other, frequency appears to change:

    $\Delta f = \frac{(c + v)}{c} \times f$

    Where c is speed of sound (m/s), v is source velocity (m/s), and f is emitted frequency.


    Signal Representations

  • Oscillograms: time vs amplitude
  • Power spectra: frequency vs time
  • Spectrograms: time vs frequency, with amplitude as color intensity
  • FFT: Time Domain to Frequency Domain

    Fast Fourier Transform decomposes a signal into its component frequencies. A waveform on the time domain (x=time, y=amplitude) doesn't tell us what frequencies are present.

    FFT transforms this to the frequency domain (x=frequency, y=amplitude), showing how much of each frequency exists in the signal. Critical for bat call analysis.


    How Bats Echolocate

    From Jon Russ's British Bat Calls:

    "Bats produce and project ultrasonic sounds from their mouths or noses, then detect echoes from solid objects. A single call provides a snapshot; a series provides a movie — like a strobe light creating staggered images."

    Bats determine size, position, speed, surface texture, and form of objects in 3D space. No single signal is optimal for all purposes, so bats evolved multiple signal types.

    Call Types

    FM (Frequency Modulated): Broadband signals spanning wide frequency ranges. Example: sweeping from 20kHz to 100kHz. Useful in cluttered environments like forests where precise spatial resolution matters.

    CF (Constant Frequency): Narrowband, long-duration calls. Provide long-range detection in open environments.

    qCF (Quasi-Constant Frequency): Combines benefits of both. Bats in cluttered environments emphasize FM; those in open spaces emphasize qCF.

    Social calls are more complex than echolocation calls — trills and harmonics comparable to bird song. Used for territory defense, attracting mates, distress signals, and mother-infant communication.

    Key Parameters

  • Peak frequency (FmaxE): Frequency of maximum energy — often the key species identifier
  • Duration: 2.5ms to 70ms
  • Pulse repetition rate: Varies by species (Natterer's bat is fast, noctule is slow)
  • Start/max frequency: Can be difficult to measure depending on background noise
  • Species Example: Greater Horseshoe Bat

  • Inter-pulse interval: 90.2ms (range: 24.9–186.6)
  • Call duration: 50.5ms (range: 16.3–73.8)
  • Peak frequency: 81.3kHz (range: 77.8–83.8)
  • Start frequency: 70.2kHz (range: 62.2–78.5)
  • Horseshoe bats use long-duration CF calls with ears tuned precisely to that frequency.


    Signal Processing Approaches

    Frequency Division (FD)

    Real-time, cheap broadband monitoring. Uses zero-crossing circuits. For every 10 input waves, outputs 1 wave of the same total duration. Reduces 80kHz to 8kHz — audible in real-time.

    Use case: GUI real-time monitoring

    Time Expansion (TE)

    Most accurate reproduction. Stores signal digitally, replays at 10x slower speed. Preserves all characteristics but can't capture new sounds during playback.

    Use case: Post-recording analysis

    Limitation: During playback, detector isn't capturing new sounds.

    My approach: FD for live preview, TE for analysis. Store originals in a folder while monitoring continues.


    What's Next

    I'm training neural networks on spectrogram images for automated species classification. Dataset: my recordings from Vietnam plus public bat call libraries.

    Coming up: FFT implementation details, spectrogram generation, and the neural network architecture.


    Read the rest of the Bat Sonar series: The Bat Sonar Project · Field Recording in Vietnamese Caves · Building a Real-Time Detector