Why Bats? A Scuba Diver's Path to Bioacoustics
Why Bats? A Scuba Diver's Path to Bioacoustics
Near Da Nang, Vietnam — where this project began
Bats and whales have almost nothing in common. One flies, one swims. One weighs grams, the other tons. Yet both have evolved the same extraordinary ability: echolocation — navigating and hunting by sound in complete darkness.
As a technical diver, I've spent time at depths where light doesn't reach. Past 50 meters, it gets dark. Past 300 meters, it's pitch black. Sperm whales dive over a thousand meters hunting giant squid — they need sonar to survive.
So why am I building bat detectors instead of whale call analyzers?
Accessibility
Capturing whale songs requires specialized vessels, hydrophones, and access to deep ocean. Bats? All I need is an ultrasonic microphone and a cave. Caves in Southeast Asia are everywhere — and ultrasonic mics cost a few hundred dollars.
This project started in Da Nang, Vietnam, along the coast of the South China Sea. The species here are different from British or American bats, but the fundamentals of bat communication are universal.
The Goal
I'm building an open-source platform with these features:
Tech stack: Python with Librosa, NumPy, Pandas, PyTorch. Runs on a Raspberry Pi 4 with 4GB RAM.
Sound Signal Primer
Wavelength, Amplitude, Frequency
The wavelength λ is the length of one cycle — the distance between two peaks.
The amplitude (sound pressure) A relates to perceived loudness. The power of a signal:
$P = A^2$
The frequency F is the rate of oscillation per second. Human hearing ranges from 20Hz to 20kHz (most people max out around 16-17kHz).
Sampling Rate and Nyquist
The sampling rate $S_r$ is the number of samples taken per second. Higher $S_r$ = better quality, more information preserved.
The Nyquist theorem states that to accurately sample a signal, $S_r$ must be at least 2x the highest frequency:
$S_r = max(F) \times 2$
$F_n = \frac{S_r}{2}$
In practice, allow more headroom:
$S_r = F_n \times 2.5$
For bat calls at 80kHz:
$80 \times 2.5 = 200kHz$
Bit Depth and File Size
Bit depth Q determines the range of values each sample can take. Common values: 16-bit or 32-bit.
File size calculation:
$size(bytes) = \frac{S_r \times Q \times C \times T}{8}$
Where C is channels and T is duration in seconds.
The Sine Wave
Any real audio is a product of many sine waves combined. The formula:
$A \times sin(2\pi ft + \phi)$
Where A is amplitude, f is frequency, t is time, and φ is phase.
Quantization
Quantization maps analog values to discrete levels. With Q=8 bits, you get $2^8 = 256$ levels to track the wave movement.
Doppler Shift
When source and listener move relative to each other, frequency appears to change:
$\Delta f = \frac{(c + v)}{c} \times f$
Where c is speed of sound (m/s), v is source velocity (m/s), and f is emitted frequency.
Signal Representations
FFT: Time Domain to Frequency Domain
Fast Fourier Transform decomposes a signal into its component frequencies. A waveform on the time domain (x=time, y=amplitude) doesn't tell us what frequencies are present.
FFT transforms this to the frequency domain (x=frequency, y=amplitude), showing how much of each frequency exists in the signal. Critical for bat call analysis.
How Bats Echolocate
From Jon Russ's British Bat Calls:
"Bats produce and project ultrasonic sounds from their mouths or noses, then detect echoes from solid objects. A single call provides a snapshot; a series provides a movie — like a strobe light creating staggered images."
Bats determine size, position, speed, surface texture, and form of objects in 3D space. No single signal is optimal for all purposes, so bats evolved multiple signal types.
Call Types
FM (Frequency Modulated): Broadband signals spanning wide frequency ranges. Example: sweeping from 20kHz to 100kHz. Useful in cluttered environments like forests where precise spatial resolution matters.
CF (Constant Frequency): Narrowband, long-duration calls. Provide long-range detection in open environments.
qCF (Quasi-Constant Frequency): Combines benefits of both. Bats in cluttered environments emphasize FM; those in open spaces emphasize qCF.
Social calls are more complex than echolocation calls — trills and harmonics comparable to bird song. Used for territory defense, attracting mates, distress signals, and mother-infant communication.
Key Parameters
Species Example: Greater Horseshoe Bat
Horseshoe bats use long-duration CF calls with ears tuned precisely to that frequency.
Signal Processing Approaches
Frequency Division (FD)
Real-time, cheap broadband monitoring. Uses zero-crossing circuits. For every 10 input waves, outputs 1 wave of the same total duration. Reduces 80kHz to 8kHz — audible in real-time.
Use case: GUI real-time monitoring
Time Expansion (TE)
Most accurate reproduction. Stores signal digitally, replays at 10x slower speed. Preserves all characteristics but can't capture new sounds during playback.
Use case: Post-recording analysis
Limitation: During playback, detector isn't capturing new sounds.
My approach: FD for live preview, TE for analysis. Store originals in a folder while monitoring continues.
What's Next
I'm training neural networks on spectrogram images for automated species classification. Dataset: my recordings from Vietnam plus public bat call libraries.
Coming up: FFT implementation details, spectrogram generation, and the neural network architecture.
Read the rest of the Bat Sonar series: The Bat Sonar Project · Field Recording in Vietnamese Caves · Building a Real-Time Detector