audio, bit depth, CIC filter, FIR filter, MEMS, microphones, PDM
This is part of a series of articles on the general subject of audio signal processing from air to information.
Previous installments include:
In this installment, we investigate the trade off between high sample rate at low bit depth and lower sample rate at higher bit depth. The raw PDM data is at a very high sample rate, but also at a very low bit depth. One bit per sample at several megasamples per second is typical, while useful digital audio should be processed and stored with more like 12 to 24 bits per sample at around 48 kilosamples per second.
We’ll occasionally refer to a “typical” MEMS microphone. In this case “typical” really means the particular microphone that we found to be broadly available in small quantities, the Knowles Digital SiSonic™ Microphone, part number SPM0437HD4H.
Start with the bits again
The previous article described the overall data flow from microphone to PCM audio data, and the subsequent implementation stunt does show that the arithmetic produces qualitatively valid answers, in that quiet rooms give small numbers and loud noises give large numbers.. However, neither attempted to justify the claim that there would be enough valid data to do anything useful with the results.
The closest we came was to mention the rule of thumb that halving the data rate could be traded for an additional bit of precision. While that rule of thumb is true, it is also more of a guideline than a rule, and it might have been better to qualify it as “at least one additional bit” rather than implying that it was exact.
CIC In Depth
The CIC filter structure was described by Hogenauer1 in 1981.
Where a single integrator-comb pair would make up exactly a boxcar filter, the CIC gets additional sharpness by cascading several copies of that boxcar at the highest sample rate. Then algebra allows their order to be rearranged so that all the integrator blocks are run before all of the comb blocks, with the sample rate reduction moved into the spot between the integrator blocks and the comb blocks which saves memory in the comb filter implementation. This shuffling of the arithmetic doesn’t change the essential properties of the filter, but it does require that the designer implement each stage of the arithmetic with sufficient bit depth, and with arithmetic that wraps on overflow.
CIC Bit Depth Requirement
For a decimation filter, if too few bits are preserved in the integrator accumulators, then more than one wrap can occur between steps of the comb filter, which is guaranteed to introduce errors. But it is possible to prove that with enough bits in each integrator’s accumulator, and using arithmetic that operates as expected across the boundary from most positive to most negative values (2’s complement arithmetic has the required properties as long as the implementation does not do either saturation clipping or detect arithmetic overflow), the comb filters will have bounded and glitch-free output.
Hogenauer’s 1981 paper computed the required arithmetic precision as a function of the filter’s design parameters. It also went further and computed the minimum precision required for each accumulator in the whole cascade. This latter result would be valuable for a hardware implementation, but does not necessarily save any resources for a pure software implementation.
Given the input bit depth
Bin, the down sampling ratio
R, the number of stages cascaded
N, and the additional comb delay factor
M, the output bit depth
Bout is given by:
Bout = ceiling(N * log2(R*M) + Bin)
Bout tells you how many significant bits are needed for the calculation. Most implementations will carry that bit depth through the whole filter and preserve it at the end. That bit depth may (usually will) imply more precision than is really present, however.
As a check, a typical use case has PDM audio at 2822400 Hz and wants to decimate by 64 to get 16-bit PCM samples at 44100 Hz (one channel of CD audio). The PDM stream is 1 bit per sample. Substituting in the equation above, we find that a three stage filter with M set to 1 requires 19 bits.
But Are the Bits Real?
The arithmetic is stable and sensible as long as the required range is provided in the accumulators, memories in the combs, and the adders that implement everything. But that doesn’t mean that the 19 bits recovered by a CIC (e.g. with R=64, M=1, N=3) from a white noise process sampled at 2.8 MHz are in any sense real. But when the PDM bit stream is sampled from a real signal with audio bandwidth using the right noise shaping, then the PDM encoding process acts to create a bit stream that has the real low bandwidth signal blended with encoding noise, but with the signal and noise bands well separated.
Qualitatively speaking, that separation allows the PDM noise to act effectively like a “dither”, and does trade the high temporal resolution for a lower sample rate with additional value resolution.
More rigorously, the PDM sampling algorithm measures the quantization error present in each output sample, and uses that error to adjust the subsequent output sample. In effect, this is a high-pass filter applied to the quanitization error present in the signal, and is the direct cause of the noise shaping effect.
The simplest arithmetic model uses the error for the single most recent sample. Better results are possible if the PDM process uses a more complex filter applied to the errors associated with several input samples.
Precision vs. Quality
Recovering a 19-bit deep stream does not necessarily imply that all 19 bits are significant. If nothing else, the system as a whole likely has a noise floor, and without extraordinary care in the hardware design and implementation the 119 dB signal to noise ratio (SNR) implied by 19 bit precision is only barely practical. In fact, a typical PDM microphone has a documented SNR of 57.5 dB, or about 10 bits of precision, measured with a 94 dB SPL 1kHz tone as stimulus.
Another systematic source of noise is aliasing. The high frequency noise is reduced in level by the low pass filter and then aliased into the pass band by the decimation step. The filter design needs to reduce the high frequency noise to an acceptable level. The calculations are tricky, but the Hogenauer 1981 paper provided tables for various values of N, M, and R that provide sufficient rejection of the high frequency sampling noise from the audio-rate passband signal.
The typical MEMS microphone has a nearly flat free field sensitivity. That is, the sensitivity is within ±0.5 dB from 100 Hz to 5 kHz, and rolls up to +5 dB at 10 kHz.
To implement an ideally flat response of the whole system, it may be necessary to probe an actual microphone mounted in your finished product housing in an anechoic chamber with pure tones and test signals.
Even without the acousting testing, the math tells us that the CIC filter rolls off at higher frequencies. Exactly how much depends on the specific CIC filter implemented.
The typical microphone response can be scaled by the CIC filter response, then an ideal compensation filter can be estimated that would make the finished samples have a flat response over the frequency range of interest. This might be required in a high quality recording application, for instance.
A typical compensating filter is best implemented as an FIR filter with a moderately sized kernel. To reduce the computational cost of this FIR filter, it can be combined with a rate reduction by a factor of perhaps 4 or 8, allowing it to be computed using fewer total arithmetic operations. Naturally, the CIC filter stages would be configured to do rate reduction by 8 or 16 as required, rather than the complete rate reduction by 64.
Another potential issue that may or may not be important is a systematic DC offset. Many MEMS microphones are implemented with a small bias in their reported sample levels. Our typical microphone is documented to be typically biased by 6% of full scale. If that bias is not removed, then a calculation such as an RMS level will have it built in as a floor, which will limit the measurement to only signals significantly louder than this level. If simply recording samples for playback, it may make sense to completely ignore this bias. Otherwise, a suitable high pass filter must be included in the signal processing flow somewhere to drop the DC bias from the finished samples.
In the analog audio domain, the DC bias would typically be removed with a simple DC [blocking capacitor][highpass], which is in effect a high-pass filter which blocks signals below 1/(2πRC) Hz, and which can be implemented by very practical components. In the digital domain, there are lots of choices of filter implementation to get a similar result. One easy answer is a direct discrete implementation of the single pole analog filter. Other answers are possible.
- Wikipedia Cascaded Integrator-Comb Filter
- Embedded.com Understanding CIC filters article.
- CIC Filter Introduction
- Demystifying Hogenauer filters
- Berkeley EECS 247 Lecture 19, 2002 B. Boser
- Altera Application Note 455: Understanding CIC
- Knowles Digital SiSonic™ Microphone Data Sheet
And what comes next?
We’ve previously demonstrated an implementation of PCM audio recovery and SPL calculation running in an 8-bit ATTiny85 CPU.
We will demonstrate PCM audio recovery in a small ARM CORTEX-M0 processor.
In the CORTEX-M0, which has more headroom for additional calculations, we will play with other configurations of the PCM reconstruction filter, and find other useful properties of the audio stream to compute.
(Written with StackEdit.)
I’m not able to understand this bit “dB signal to noise ratio (SNR) implied by 19 bit precision is only barely practical. In fact, a typical PDM microphone has a documented SNR of 57.5 dB, or about 10 bits of precision”
If the PDM Mic can only generate 10 bits of precision, then why bother getting a 19 bit result at the converter?
Thanks for writing this series!
In the described case, you need 19 bits in the accumulator for the (all integer) arithmetic to be stable which guarantees that the output of the decimation and comb filter is glitch-free. That inherently produces an 19 bit result in the final accumulator, but without understanding all the noise sources in your system you don’t really know how many bits of the result are “useful”.
Converting that to 16-bit PCM, you would immediately discard 3 bits without qualms because it would have to be a rather exotic analog design to have 19 bits of SNR. Even if six of the remaining 16 bits are noisy, it is likely still better to preserve them than to quantize the output to 10 bits in a 16-bit word.
For this mic, it is probably best to transform to µLaw or aLaw and store 8 bits, which is what its typical user (a cell phone or other telephony application) is likely going to do.