MICROPHONE

The Opening Speak. Your vocal cords vibrate at 85-255 Hz, pushing air molecules back and forth by less than 1 micrometer. Those pressure waves hit a membrane in your phone and move it by 0.01 nanometers — 1/10th the diameter of an atom. Your phone converts that atom-scale movement into an electrical signal, digitizes it at 48,000 samples/second, and sends it across the planet. Requirements: ├── Detect 20 micropascals (threshold of human hearing) ├── Respond 20 Hz - 20,000 Hz (full audible range) ├── Handle 120 dB dynamic range (1,000,000:1 pressure ratio) ├── Be smaller than a fingernail └── Do all of this while introducing less noise than the random motion of air molecules Let's build one.
───
PHASE 1: Hear the Pressure
Clap your hands. You just created a pressure wave — air molecules pushed together, then pulled apart, traveling at 343 m/s. By the time the clap reaches someone 10 meters away, the pressure fluctuation is about 0.02 pascals. The atmosphere pushes on them at 101,325 Pa. You're asking them to detect a change of 1 part in 5 billion. Sound is a pressure wave. At any point in space, the pressure oscillates: P(t) = P₀ sin(2πft) Where: ├── P₀ = peak pressure amplitude (pascals) ├── f = frequency (Hz) └── t = time (seconds) A 1 kHz tone at conversational volume (60 dB SPL): P₀ = 0.02 Pa. The air pressure swings ±0.02 Pa around atmospheric, completing 1,000 full cycles per second.
The Decibel Scale — Why We Need Logarithms The human ear detects pressures from 20 μPa (a mosquito 3 meters away) to 20 Pa (a jet engine at 1 meter). That's a ratio of 1,000,000:1. Working with linear numbers across six orders of magnitude is impractical, so we compress: dB SPL = 20 × log₁₀(P / P_ref) Where P_ref = 20 μPa (threshold of hearing).
Source Pressure dB SPL ────────────────────────────────────────────────────── Threshold of hearing 20 μPa 0 dB Quiet room 632 μPa 30 dB Conversation at 1m 0.02 Pa 60 dB Busy street 0.2 Pa 80 dB Rock concert 2 Pa 100 dB Pain threshold 20 Pa 120 dB Jet engine at 1m 200 Pa 140 dB Shockwave (eardrum rupture) 2,000 Pa 160 dBEvery +6 dB doubles the pressure. Every +20 dB multiplies it by 10. A 120 dB range means the loudest signal is 1,000,000× the quietest — and the microphone must handle both without distortion.
Air as a Medium — What the Mic Actually Senses Sound in air has specific physical properties that constrain microphone design: Acoustic impedance of air: Z = ρc = 1.225 × 343 = 420 Pa·s/m Particle velocity at threshold (0 dB SPL): v = P / Z = 20×10⁻⁶ / 420 = 4.8 × 10⁻⁸ m/s That's 48 nanometers per second. At 1 kHz, the particle displacement: x = v / (2πf) = 4.8×10⁻⁸ / 6,283 = 7.6 × 10⁻¹² m Displacement at hearing threshold: 7.6 picometers. Smaller than the radius of a hydrogen atom (53 pm). The microphone membrane must follow movements smaller than an atom to capture the quietest sounds.
DESIGN SPEC UPDATED: ├── Sound: P(t) = P₀sin(2πft), pressure wave in air at 343 m/s ├── Dynamic range: 20 μPa (0 dB) to 20 Pa (120 dB) — 1,000,000:1 ├── dB SPL = 20 log₁₀(P/20μPa) ├── Particle displacement at threshold: 7.6 picometers (sub-atomic) └── Acoustic impedance of air: 420 Pa·s/m
───
PHASE 2: Move a Membrane
You need a surface that moves when sound hits it. Thinner is better — a thinner membrane has less inertia, so it can follow faster pressure changes. But thinner also means weaker. The membrane must survive years of use, humidity, temperature swings, and the occasional plosive "P" sound that hits it with 10 Pa of pressure. The diaphragm is a thin membrane stretched across a frame. Sound pressure pushes it in and out. The displacement: x = P × A / k Where: ├── P = sound pressure (Pa) ├── A = diaphragm area (m²) └── k = mechanical stiffness (N/m) For a 25mm diameter condenser diaphragm (A = 4.91 × 10⁻⁴ m²), stiffness k = 500 N/m, at 1 Pa (94 dB SPL): x = 1 × 4.91×10⁻⁴ / 500 x = 9.82 × 10⁻⁷ m ≈ 1 micrometer At threshold of hearing (20 μPa): x = 20×10⁻⁶ × 4.91×10⁻⁴ / 500 x = 1.96 × 10⁻¹¹ m ≈ 0.02 nanometers We need to detect 0.02 nm of motion. For reference, a silicon atom is 0.22 nm in diameter.
Diaphragm Materials
Material Thickness Tension Mass/area Used In ────────────────────────────────────────────────────────────────── Gold-sputtered 3-6 μm Low 0.04 g/cm² Studio condenser Mylar (PET) Nickel foil 2-5 μm Medium 0.09 g/cm² Vintage ribbon Aluminum 5-10 μm Medium 0.01 g/cm² Dynamic mic dome Titanium 10-25 μm High 0.05 g/cm² Measurement mic Polysilicon 0.5-1 μm High 0.002 g/cm² MEMS micStudio condensers use Mylar film just 3-6 μm thick — thinner than a red blood cell (8 μm). It's sputtered with a few nanometers of gold to make it conductive. This gossamer film must track atomic-scale movements for decades.
Resonance — The Frequency Ceiling Every diaphragm has a resonant frequency. At resonance, the diaphragm's response peaks violently. Above resonance, it falls off and the mic becomes deaf. f_res = (1/2π) × √(k / m_eff) For flat response across 20 Hz - 20 kHz, you need resonance ABOVE the audible range: f_res > 20,000 Hz (ideally 25-40 kHz) This forces a tradeoff: ├── Large diaphragm (25mm): more area → more signal → lower noise │ but more mass → lower resonance → reduced high-frequency response ├── Small diaphragm (12mm): less mass → higher resonanceflatter response │ but less area → less signalmore noise └── MEMS (1mm): extremely low mass → resonance at 20-30 kHz but tiny area → high noise floor This is the fundamental microphone engineering tradeoff: sensitivity vs bandwidth.
DESIGN SPEC UPDATED: ├── Diaphragm displacement: x = PA/k → ~1 μm at 94 dB, ~0.02 nm at 0 dB ├── Material: gold-sputtered Mylar, 3-6 μm thick ├── Resonant frequency must exceed 20 kHz for flat audible response ├── f_res = (1/2π)√(k/m) — lower mass or higher stiffness → higher resonance └── Fundamental tradeoff: large diaphragm (low noise) vs small diaphragm (flat response)
───
PHASE 3: Make It Electric — Condenser
The diaphragm moves. Now what? You have a piece of metal-coated plastic vibrating by nanometers. You need to turn that movement into a voltage that a wire can carry. The condenser microphone does this with one of the simplest ideas in physics: a capacitor whose plates move. A parallel-plate capacitor: C = ε₀ × A / d Where: ├── ε₀ = 8.854 × 10⁻¹² F/m (permittivity of free space) ├── A = plate area (m²) └── d = gap between plates (m) The condenser mic places the conductive diaphragm parallel to a fixed metal backplate. Gap: typically 20-50 μm. When sound moves the diaphragm, d changes, so C changes. For a 25mm diameter capsule, gap = 25 μm: C = 8.854×10⁻¹² × 4.91×10⁻⁴ / 25×10⁻⁶ C = 174 pF
Constant Charge — The Conversion Trick Charge a capacitor to voltage V₀ through a very high resistance (gigaohms). The charge Q = CV stays constant because it can't leak through the resistor fast enough. With Q fixed: V = Q / C = Q × d / (ε₀A) When sound moves the diaphragm by Δd: ΔV = V₀ × Δd / d₀
Sound pressure ↓ ↓ ↓ ┌─────────────────────────┐ │ ░░░░ diaphragm ░░░░░░░ │ ← gold-sputtered Mylar (6 μm) └─────────────────────────┘ ↕ gap d ≈ 25 μm d changes with sound ┌─────────────────────────┐ │ ████ backplate █████████│ ← fixed brass, holes for air └─────────────────────────┘ │ │ │ C = ε₀A/d │ │ │ ─────┤ ├────── Output voltage │ │ ─┴─ V₀ = 48V ─┴─ (phantom power) At 94 dB (1 Pa), diaphragm moves ~1 μm: ΔV = 48 × (1×10⁻⁶) / (25×10⁻⁶) ΔV = 48 × 0.04 ΔV = 1.92 V peak (huge signal!) At 0 dB (20 μPa), diaphragm moves ~0.02 nm: ΔV = 48 × (0.02×10⁻⁹) / (25×10⁻⁶) ΔV = 38.4 nV peak (buried in noise)The condenser mic is a variable capacitor biased at constant charge. Sound changes the gap, which changes the capacitance, which changes the voltage. Simple physics, extraordinary sensitivity.
The Impedance Converter The capsule output impedance is enormous — the capacitor plate is essentially an open circuit at low frequencies. You cannot run a cable from a 174 pF capacitor to a mixer 10 meters away. The cable capacitance (~1 nF) would swamp the capsule, destroying the signal. Solution: a JFET impedance converter, mounted inside the microphone body, millimeters from the capsule. It converts the ultra-high impedance voltage signal (~10 GΩ) to a low impedance output (~200 Ω) that can drive long cables. This is why condenser mics need power. The JFET and bias circuit require 48V phantom power — sent up the same cable that carries the audio signal down. No phantom power, no condenser mic.
DESIGN SPEC UPDATED: ├── Capacitance: C = ε₀A/d ≈ 174 pF for 25mm capsule ├── Conversion: ΔV = V₀ × Δd/d₀ at constant charge Q ├── Sensitivity at 94 dB: ~1.92 V peak (strong signal) ├── Needs 48V phantom power for bias + impedance converter └── JFET converts 10 GΩ capsule impedance → 200 Ω output
───
PHASE 4: Make It Electric — Dynamic
What if you don't want to deal with phantom power, bias voltages, and fragile capacitor gaps? What if you need a mic that survives being dropped on a concrete stage, spit on by a vocalist, and stored in a van at 50°C? You need a different conversion principle. One that generates voltage directly from motion. Faraday's law: move a conductor through a magnetic field, and a voltage appears. V = B × L × v Where: ├── B = magnetic flux density (tesla) ├── L = length of conductor in the field (m) └── v = velocity of the conductor (m/s) The dynamic microphone attaches a coil of wire to the diaphragm. The coil sits in a permanent magnet's gap. When sound moves the diaphragm, the coil moves through the magnetic field, generating voltage.
The Voice Coil — Numbers
Sound pressure → ┌───────────────────────────┐ │ ═══ diaphragm ═══════════│ ← aluminum or Mylar dome │ │ │ │ ┌────┴────┐ │ │ │ coil │ │ voice coil: 200 turns │ │ ║║║║║║║ │ │ of 40 AWG copper wire │ └────┬────┘ │ wound on a 25mm former │ │ │ │ ╔════╧════╗ │ │ ║ N magnet S ║ │ neodymium: B = 1.2 T │ ╚═══════════╝ │ └───────────────────────────┘ │ ↓ voltage out (no power needed!) Coil: 200 turns × π × 0.025m = L = 15.7 m of wire in the field B = 1.2 T (neodymium magnet) At 94 dB SPL (1 Pa), diaphragm velocity: v = P/(Z_mech) ≈ 0.002 m/s V = B × L × v = 1.2 × 15.7 × 0.002 V = 0.038 V = 38 mV peak Compare to condenser: 1,920 mV vs 38 mV Dynamic mic output is ~50× weaker.The dynamic mic generates voltage directly — no external power, no bias circuit. But it generates 50× less signal than a condenser, so preamps must work harder. The Shure SM58 has survived more abuse than any electronic device in history because there's nothing fragile to break.
Condenser vs Dynamic — When Each Wins
Property Condenser Dynamic ────────────────────────────────────────────────────────── Sensitivity -30 to -40 dBV -55 to -60 dBV Self-noise 5-15 dB-A 18-25 dB-A Frequency response 20 Hz - 20 kHz ±2dB 80 Hz - 15 kHz ±3dB Transient response Excellent Good Max SPL 130-140 dB 150+ dB Durability Fragile Nearly indestructible Humidity tolerance Poor Excellent Power required 48V phantom None Cost $100-$10,000 $50-$500Studios use condensers for their sensitivity and detail. Live stages use dynamics for their durability and high SPL handling. The choice is always about which set of compromises fits your situation.
The Ribbon Microphone — A Third Way Instead of a coil attached to a diaphragm, suspend a thin metal ribbon (2 μm aluminum) directly in a magnetic field. The ribbon IS the diaphragm AND the conductor. Pure simplicity. V = B × L × v (same law, but L = ribbon length, not coil) Ribbon: 60mm long, 4mm wide, 2 μm thick aluminum Mass: 0.0006 grams — lighter than a condenser diaphragm The ribbon's extraordinary lightness gives it the fastest transient response of any microphone type. But the output is minuscule (~0.5 mV at 94 dB) and the impedance is extremely low (~0.2 Ω), requiring a high-ratio step-up transformer.
DESIGN SPEC UPDATED: ├── Dynamic: V = BLv, no power needed, ~38 mV at 94 dB ├── Condenser: 50× more sensitive but needs phantom power, fragile ├── Ribbon: lightest diaphragm (0.6 mg), best transients, lowest output ├── Live stage → dynamic. Studio → condenser. Special character → ribbon. └── All three convert mechanical motion to voltage using different physics
───
PHASE 5: Shape What You Hear
Point a microphone at a singer. The singer is 30 cm away. The drummer is 2 meters behind. The bass amp is 3 meters to the left. An omnidirectional mic picks up ALL of it — it can't tell front from back. You need a mic that listens in one direction and ignores the rest. A polar pattern describes a microphone's sensitivity as a function of angle. The three fundamental patterns come from physics, not engineering: Omnidirectional — responds equally to sound from all directions. One pressure-sensitive diaphragm, sealed on one side. Sound pressure is a scalar — it has no direction. The mic simply measures "how much pressure is there?" regardless of where it came from. Figure-8 (Bidirectional) — responds to front and back, rejects sides. Diaphragm open on both sides. Sound from the front pushes the diaphragm in. Sound from the back pushes it out (inverted polarity). Sound from the side hits both sides equally — net force: zero. Omni: R(θ) = 1 (constant for all angles) Figure-8: R(θ) = cos(θ) (max at 0°/180°, zero at 90°/270°)
The Cardioid — Combining Omni + Figure-8 Here's the elegant trick. Add the omni and figure-8 patterns: Cardioid: R(θ) = ½(1 + cos θ)
OMNI FIGURE-8 CARDIOID R = 1 R = cos(θ) R = ½(1 + cos θ) ╭───╮ ╭─╮ ╭───╮ ╭─┤ ├─╮ ╭─┤ ├─╮ ╭─┤ ├─╮ │ │ │ │ │ │ │ │ │ │ │ │ │ × │ │ ───┤ × │ ├─── │ │ × │ │ │ │ │ │ │ │ │ │ │ │ ╰─┤ ├─╰ ╰─┤ ├─╰ ╰─┤ ├─╰ ╰───╯ ╰─╯ ╰─╯ Equal all Front+back Front: full directions Sides: zero Back: zero Sides: half At θ = 0° (front): R = ½(1 + 1) = 1.0 (full sensitivity) At θ = 90° (side): R = ½(1 + 0) = 0.5 (-6 dB) At θ = 180° (back): R = ½(1 + (-1)) = 0.0 (null — no pickup)The cardioid is literally half omni plus half figure-8. It picks up the front, partially picks up the sides, and completely rejects the back. This is why it's the default pattern for live performance — it ignores the monitors behind it.
Extended Patterns — Tuning the Mix By adjusting the ratio of omni to figure-8:
Pattern Formula Front Side Back Rejection ────────────────────────────────────────────────────────────────────── Omni R = 1 1.0 1.0 1.0 None Subcardioid R = 0.7 + 0.3cosθ 1.0 0.7 0.4 -4 dB Cardioid R = ½(1 + cosθ) 1.0 0.5 0.0 -∞ dB Supercardioid R = 0.37+0.63cosθ 1.0 0.37 -0.26 -12 dB at 125° Hypercardioid R = 0.25+0.75cosθ 1.0 0.25 -0.5 -6 dB at 110° Figure-8 R = cosθ 1.0 0.0 1.0 -∞ dB at 90°Super/hyper cardioid patterns have a small rear lobe (negative = inverted polarity) but tighter side rejection. Shotgun mics use interference tubes to narrow the pattern further — but the physics breaks down below ~1 kHz, where the pattern widens back to a cardioid.
The Proximity Effect — When Distance Changes the Pattern Move a cardioid mic close to a sound source (<30 cm). The bass frequencies get louder — dramatically. A voice at 5 cm can have +12 dB at 100 Hz compared to the same voice at 1 meter. Why? The figure-8 component of the cardioid responds to the pressure gradient (difference in pressure between front and back of diaphragm). Close to the source, the sound field is spherical — pressure drops off as 1/r. The gradient across the small gap (front to back of capsule) becomes much steeper at low frequencies near the source. Radio DJs use this intentionally — that "deep voice" effect is proximity effect.
DESIGN SPEC UPDATED: ├── Cardioid: R(θ) = ½(1 + cosθ) — omni + figure-8 combined ├── Rejects rear sound (θ=180°), 6 dB down at sides (θ=90°) ├── Pattern family: all variations of omni + figure-8 ratio ├── Proximity effect: +12 dB bass boost at close range (pressure gradient effect) └── Shotgun pattern uses interference tube — narrows above ~1 kHz only
───
PHASE 6: Fight the Noise
In a perfectly silent room — no traffic, no HVAC, no insects — point a microphone at nothing and turn the gain up. You'll hear a hiss. That hiss isn't interference. It isn't bad engineering. It's the thermal vibration of air molecules hitting the diaphragm and the random motion of electrons in the resistors. Physics itself sets a noise floor you cannot design away. Source 1: Brownian Motion (Acoustic Noise) Air molecules at room temperature move at ~500 m/s in random directions. These randomly impacting the diaphragm create a fluctuating force. For a small cavity (the air gap behind the diaphragm), this thermal noise has a spectral density: P_noise = √(4kTR_a × Δf) Where: ├── k = Boltzmann's constant (1.38 × 10⁻²³ J/K) ├── T = temperature (293 K) ├── R_a = acoustic resistance of the air gap └── Δf = measurement bandwidth (20 Hz - 20 kHz) For a typical condenser capsule: acoustic self-noise ≈ 5-8 dB-A SPL This means even a PERFECT microphone with zero electronic noise still has a noise floor of ~5 dB-A from air molecules bouncing against the diaphragm.
Source 2: Johnson Noise (Electronic Noise) Every resistor generates voltage noise from thermal electron motion: V_n = √(4kTRΔf) For the JFET bias resistor (R = 10 GΩ, but we care about equivalent noise at the input): The JFET itself has input noise voltage: ~2-5 nV/√Hz Over the audio band (20 kHz bandwidth): V_noise = 4 nV/√Hz × √20,000 V_noise = 4 × 141 V_noise = 566 nV ≈ 0.57 μV Compare to signal at threshold: 38 nV (from Phase 1). The electronics noise is 15× louder than the quietest sound. This is why the "equivalent noise level" (ENL) of even the best microphones is 5-7 dB-A — well above the 0 dB threshold of hearing.
Why Large Diaphragms Are Quieter
Diaphragm Area Signal at Self-Noise SNR Diameter (mm²) 94 dB SPL (dB-A) at 94 dB ─────────────────────────────────────────────────────────────── MEMS 1mm 0.8 0.3 mV 30-35 dB-A 59 dB Small 12mm 113 8 mV 14-18 dB-A 76 dB Large 25mm 491 20 mV 7-12 dB-A 82 dB Large 34mm 908 35 mV 5-7 dB-A 87 dB Signal scales with area (A). Electronic noise stays constant. SNR improves as: ΔSNR = 20 log₁₀(A₂/A₁) Double the diameter → 4× the area → +12 dB SNRThis is why studio vocal mics have large diaphragms (25-34mm). The larger surface intercepts more sound energy, producing a stronger signal. The electronics noise stays the same. More signal, same noise = better signal-to-noise ratio.
DESIGN SPEC UPDATED: ├── Brownian noise: ~5-8 dB-A — air molecule thermal motion (cannot eliminate) ├── Johnson noise: V_n = √(4kTRΔf) — electron thermal noise in resistors ├── Best condenser self-noise: 5-7 dB-A (Neumann U87: 12 dB-A) ├── Large diaphragm: +12 dB SNR per doubling of diameter (area effect) └── Physics floor: no mic can hear below ~5 dB-A — thermal noise is fundamental
───
PHASE 7: Handle the Loud
A kick drum at 6 inches: 140 dB SPL. That's 200 Pa of peak pressure — 10 million times the threshold of hearing. The diaphragm displacement at 140 dB is 10,000× the displacement at 94 dB. The capsule voltage is 10,000× larger. At some point, the electronics saturate, the diaphragm hits the backplate, or the output signal clips. When that happens, the clean sine wave gets its peaks chopped off, and you hear distortion. Maximum SPL — Where Clipping Begins The maximum SPL is defined as the pressure at which total harmonic distortion (THD) reaches 0.5% (condenser) or 1% (dynamic). At 130 dB SPL (63 Pa): Diaphragm displacement: ~60 μm (for a 25mm capsule) Gap distance: 25 μm Problem: the diaphragm is trying to move 2.4× the gap distance. It physically hits the backplate. This is mechanical clipping. At 140 dB SPL (200 Pa): Capsule output voltage: ~10 V peak JFET maximum swing: ~12 V (limited by phantom power) The electronics are almost saturated. This is electronic clipping.
The Pad Switch — Buying Headroom
Max SPL Max SPL (no pad) (with pad) ┌────────────┐ ───────── ────────── │ CAPSULE │──→ signal │ (too much │ │ │ voltage) │ ├──→ [no pad] ──→ 130 dB │ │ │ └────────────┘ ├──→ [-10 dB] ──→ 140 dB │ (÷3.16) └──→ [-20 dB] ──→ 150 dB (÷10) -10 dB pad: resistor divider before the JFET → capsule still clips mechanically at 140+ dB → but electronics get 1/3 the voltage → more headroom -20 dB pad: cuts signal by 10× → electronics can handle 150 dB → but noise floor rises by 20 dB (same noise, less signal) The pad doesn't change the capsule — it changes how much of the capsule's output reaches the amplifier.Pad switches trade noise floor for headroom. A -10 dB pad lets you mic a snare drum at close range. A -20 dB pad lets you put a mic inside a kick drum. But you lose 10-20 dB of signal-to-noise ratio — acceptable for loud sources that don't need it.
The Full Dynamic Range Picture
dB SPL 0 ──── threshold of hearing ───── below mic noise floor │ 7 ──── mic self-noise ──────────── quietest signal detectable │ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ USABLE RANGE ▓▓▓▓▓ ~123 dB │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ (excellent mic) │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ 130 ──── max SPL (0.5% THD) ───── clipping begins │ 140 ──── with -10 dB pad ────────── extended range │ 150 ──── with -20 dB pad ────────── extreme SPL Dynamic range = Max SPL - Self-noise Best condenser: 130 - 7 = 123 dB Good dynamic: 150 - 20 = 130 dB (lower noise floor matters less) MEMS phone mic: 120 - 35 = 85 dBThe human ear has about 120 dB of dynamic range. The best studio microphones match or exceed this. MEMS mics in phones sacrifice 35 dB of dynamic range compared to studio mics — you can hear the difference in quiet passages.
DESIGN SPEC UPDATED: ├── Max SPL at 0.5% THD: ~130 dB for condenser, ~150 dB for dynamic ├── Clipping: mechanical (diaphragm hits backplate) or electronic (JFET saturates) ├── Pad switch: -10 dB adds 10 dB headroom, costs 10 dB noise floor ├── Dynamic range: Max SPL - Self-noise → best: ~123 dB (studio), ~85 dB (MEMS) └── Human ear: ~120 dB dynamic range — studio mics match this, phone mics don't
───
PHASE 8: Go Digital
Everything so far describes a microphone that's 20-50mm across, weighs 200-500 grams, and costs $100-$3,000. Your phone has a microphone that's 3mm × 3mm × 1mm, weighs 30 milligrams, costs $0.30, and it works well enough for you to be understood across continents. How? MEMS — Micro-Electro-Mechanical Systems A MEMS microphone is a condenser mic built on a silicon chip using the same processes that make computer processors.
Side view (not to scale — actual size: 3mm × 3mm × 1mm) Sound enters through port hole ↓ ┌─────────○─────────┐ ← lid (metal or laminate) │ │ │ ┌─────────────┐ │ ← ASIC die (amplifier + ADC) │ └──────┬──────┘ │ │ │wire bond│ │ ┌──────┴──────┐ │ │ │ ░░░ gap ░░░ │ │ ← 2-4 μm air gap │ │─────────────│ │ ← polysilicon diaphragm (0.5-1 μm) │ │ ▓▓▓▓▓▓▓▓▓▓▓│ │ ← perforated backplate (Si) │ │ ▓ holes ▓▓▓▓│ │ (lets air through, keeps plate rigid) │ └─────────────┘ │ │ MEMS die │ └───────────────────┘ ← PCB substrate ↓ solder pads Diaphragm: ~1 mm diameter polysilicon, 0.5-1 μm thick Gap: 2-4 μm Capacitance: ~1-3 pF (vs 174 pF for studio mic) Bias voltage: ~10 V (charge pump on-chip)The entire condenser microphone — diaphragm, backplate, air gap, bias supply, amplifier, and analog-to-digital converter — fits in a 3mm package. Manufactured in batches of millions on silicon wafers. Cost: $0.30.
The ASIC — Analog to Digital on the Same Chip Modern MEMS mics include the ADC on-chip. The signal chain: Diaphragm → capacitance change → charge amplifier → sigma-delta ADC → PDM bitstream PDM (Pulse Density Modulation): a 1-bit signal at 1-3 MHz. Dense pulses = loud. Sparse pulses = quiet. The phone's audio codec filters this to standard 16/24-bit PCM at 48 kHz. Performance of a typical MEMS mic (e.g., Knowles SPH0645): ├── SNR: 65 dB (vs studio condenser: 87 dB) ├── Self-noise: 29 dB-A (vs studio: 7 dB-A) ├── Max SPL: 120 dB (vs studio: 130 dB) ├── Dynamic range: 91 dB (vs studio: 123 dB) ├── Sensitivity: -26 dBFS at 94 dB SPL ├── Frequency range: 100 Hz - 10 kHz ├── Power: 0.6 mA at 1.8V = 1 mW └── Size: 3.5 × 2.65 × 0.98 mm
Why Phones Sound Acceptable Despite Bad Specs A MEMS mic has 22 dB worse noise floor than a studio condenser. But: ├── Voices are at 60-80 dB SPL — well above the 29 dB noise floor ├── Phone codecs compress audio to 16-64 kbit/s — codec artifacts mask mic noise ├── Dual/triple MEMS arrays enable noise cancellation (Phase 10) ├── DSP algorithms suppress stationary noise in real-time └── Narrowband voice (300-3,400 Hz) discards where MEMS performs worst The MEMS mic doesn't need to be great. It needs to be good enough at 1/10,000th the cost.
DESIGN SPEC UPDATED: ├── MEMS: condenser mic on silicon, 3mm package, $0.30, 1 mW ├── Diaphragm: 1mm polysilicon, 0.5-1 μm thick ├── On-chip ADC: sigma-delta → PDM bitstream at 1-3 MHz ├── SNR: 65 dB (vs studio 87 dB) — 22 dB penalty for 10,000× cost reduction └── Good enough for voice at 60-80 dB in 3mm package
───
PHASE 9: Cancel the World
You're on a plane. The engines roar at 85 dB. You put on noise-cancelling headphones. The roar drops to a hum. It feels like magic, but it's the most literal application of superposition: play the exact opposite pressure wave, and the two cancel. Active Noise Cancellation (ANC) — The Principle Sound is a pressure wave: P(t) = P₀ sin(2πft) If you generate: P_anti(t) = -P₀ sin(2πft) = P₀ sin(2πft + π) The sum: P(t) + P_anti(t) = 0 Perfect cancellation. In theory. In practice, you need to measure the noise, compute the anti-noise, and play it through a speaker — all before the original sound arrives at your ear.
FEEDFORWARD ANC (external mic) noise outside inside ↓ mic ──→ DSP ──→ speaker ──→ ear ←── noise arrives │ │ │ compute cancellation happens │ anti-noise at the ear │ │ └── measures noise ← mic OUTSIDE the ear cup BEFORE it arrives gives time to process FEEDBACK ANC (internal mic) noise inside ↓ DSP ──→ speaker ──→ ear ←── residual ↑ │ └────── mic ─────────┘ INSIDE ear cup measures what's LEFT after cancellation corrects errors in real-time HYBRID ANC (both — modern headphones) Uses feedforward for initial cancellation + feedback for correction. Best systems achieve 30-40 dB of noise reduction below 1 kHz.Feedforward is fast but imprecise. Feedback is precise but slow. Hybrid combines both — the external mic predicts the noise, the internal mic measures the error, and the DSP corrects continuously.
Why ANC Works Below 1 kHz But Fails Above The anti-noise must arrive at your ear at EXACTLY the right time. Any timing error creates a phase mismatch: Cancellation error = 2 × P₀ × sin(π × f × Δt) Where Δt is the timing error. For the system's total latency of ~50 μs: At 100 Hz: error = 2P₀ × sin(π × 100 × 50×10⁻⁶) = 2P₀ × sin(0.016) = 2P₀ × 0.016 = 1.6% residual-36 dB cancellation At 1 kHz: error = 2P₀ × sin(π × 1000 × 50×10⁻⁶) = 2P₀ × sin(0.157) = 2P₀ × 0.157 = 15.7% residual-16 dB cancellation At 5 kHz: error = 2P₀ × sin(π × 5000 × 50×10⁻⁶) = 2P₀ × sin(0.785) = 2P₀ × 0.707 = 70.7% residual-3 dB cancellation Above 5 kHz, ANC actually makes noise WORSE — the anti-noise adds rather than cancels. This is why ANC headphones rely on passive isolation (ear cup sealing) for high frequencies.
The DSP Requirements The ANC filter must update in real-time, adapting to changing noise: ├── Sample rate: 48-96 kHz ├── Filter length: 128-512 taps (FIR adaptive filter) ├── Algorithm: LMS (Least Mean Squares) or FxLMS ├── Latency budget: <100 μs total (mic → DSP → speaker) ├── Update rate: every sample (48,000 adjustments per second) └── Power: 10-50 mW for the DSP alone The filter coefficients adapt using gradient descent — the same math as neural network training, but running 48,000 times per second on a chip the size of a grain of rice.
DESIGN SPEC UPDATED: ├── ANC: play anti-phase sound to cancel noise (superposition principle) ├── Feedforward (external mic) + feedback (internal mic) = hybrid ANC ├── Effective below ~1 kHz (30-40 dB reduction), fails above 5 kHz ├── Timing error: cancellation ∝ sin(πfΔt) — higher frequency = worse └── DSP: adaptive FIR filter, LMS algorithm, <100 μs latency
───
PHASE 10: Array Processing
───
FULL MAP Microphone ├── Phase 1: Hear the Pressure ├── Sound: P(t) = P₀sin(2πft), pressure wave in air at 343 m/s} ├── Dynamic range: 20 μPa (0 dB) to 20 Pa (120 dB) — 1,000,000:1} ├── dB SPL = 20 log₁₀(P/20μPa)} ├── Particle displacement at threshold: 7.6 picometers (sub-atomic)} └── Acoustic impedance of air: 420 Pa·s/m} ├── Phase 2: Move a Membrane ├── Diaphragm displacement: x = PA/k → ~1 μm at 94 dB, ~0.02 nm at 0 dB} ├── Material: gold-sputtered Mylar, 3-6 μm thick} ├── Resonant frequency must exceed 20 kHz for flat audible response} ├── f_res = (1/2π)√(k/m) — lower mass or higher stiffness → higher resonance} └── Fundamental tradeoff: large diaphragm (low noise) vs small diaphragm (flat response)} ├── Phase 3: Make It Electric — Condenser ├── Capacitance: C = ε₀A/d ≈ 174 pF for 25mm capsule} ├── Conversion: ΔV = V₀ × Δd/d₀ at constant charge Q} ├── Sensitivity at 94 dB: ~1.92 V peak (strong signal)} ├── Needs 48V phantom power for bias + impedance converter} └── JFET converts 10 GΩ capsule impedance → 200 Ω output} ├── Phase 4: Make It Electric — Dynamic ├── Dynamic: V = BLv, no power needed, ~38 mV at 94 dB} ├── Condenser: 50× more sensitive but needs phantom power, fragile} ├── Ribbon: lightest diaphragm (0.6 mg), best transients, lowest output} ├── Live stage → dynamic. Studio → condenser. Special character → ribbon.} └── All three convert mechanical motion to voltage using different physics} ├── Phase 5: Shape What You Hear ├── Cardioid: R(θ) = ½(1 + cosθ) — omni + figure-8 combined} ├── Rejects rear sound (θ=180°), 6 dB down at sides (θ=90°)} ├── Pattern family: all variations of omni + figure-8 ratio} ├── Proximity effect: +12 dB bass boost at close range (pressure gradient effect)} └── Shotgun pattern uses interference tube — narrows above ~1 kHz only} ├── Phase 6: Fight the Noise ├── Brownian noise: ~5-8 dB-A — air molecule thermal motion (cannot eliminate)} ├── Johnson noise: V_n = √(4kTRΔf) — electron thermal noise in resistors} ├── Best condenser self-noise: 5-7 dB-A (Neumann U87: 12 dB-A)} ├── Large diaphragm: +12 dB SNR per doubling of diameter (area effect)} └── Physics floor: no mic can hear below ~5 dB-A — thermal noise is fundamental} ├── Phase 7: Handle the Loud ├── Max SPL at 0.5% THD: ~130 dB for condenser, ~150 dB for dynamic} ├── Clipping: mechanical (diaphragm hits backplate) or electronic (JFET saturates)} ├── Pad switch: -10 dB adds 10 dB headroom, costs 10 dB noise floor} ├── Dynamic range: Max SPL - Self-noise → best: ~123 dB (studio), ~85 dB (MEMS)} └── Human ear: ~120 dB dynamic range — studio mics match this, phone mics don't} ├── Phase 8: Go Digital ├── MEMS: condenser mic on silicon, 3mm package, $0.30, 1 mW} ├── Diaphragm: 1mm polysilicon, 0.5-1 μm thick} ├── On-chip ADC: sigma-delta → PDM bitstream at 1-3 MHz} ├── SNR: 65 dB (vs studio 87 dB) — 22 dB penalty for 10,000× cost reduction} └── Good enough for voice at 60-80 dB in 3mm package} ├── Phase 9: Cancel the World ├── ANC: play anti-phase sound to cancel noise (superposition principle)} ├── Feedforward (external mic) + feedback (internal mic) = hybrid ANC} ├── Effective below ~1 kHz (30-40 dB reduction), fails above 5 kHz} ├── Timing error: cancellation ∝ sin(πfΔt) — higher frequency = worse} └── DSP: adaptive FIR filter, LMS algorithm, <100 μs latency} └── Phase 10: Array Processing
───
Telescope Car