audio, MEMS, microphones, PDM
This is part of a series of articles on the general subject of audio signal processing from air to information. Previous installments include:
In this installment, we explore the PDM signal coding and the signal processing required to transform it from its representation on the wire into useful audio samples.
Start with the bits
A class of interesting MEMS microphones (described by an earlier article) uses a micromachined diaphragm to convert pressure waves (sound) into a series of single-bit samples at a high sample rate.
Finding the signal
The PDM signal contains two components: an audio signal, and a high frequency noise. Extracting the audio from it merely requires a suitable low-pass filter to reject the noise.
For computational efficiency, the filtering is sensibly combined with a decimation stage to reduce the sample rate to one suited to the finished audio.
Touching back to the practical
Practical implementations of PDM microphones use a sample rate clock generated by an external source, and generate their samples on either the rising or falling edge of that clock. This permits stereo signals to be carried on a single data wire with one channel driving on rising edges and the other on falling edges.
A typical example microphone is the Knowles SPM0437HD4H, which is a low power top port silicon digital microphone. Of its six pins, three are power (two grounds and one Vdd), one selects the data phase, one is the clock input, and the last is the data output.
The clock is driven from the system. In addition to setting the sample rate, it also signals a low-power sleep mode. If the supplied clock falls below 1 kHz, the microphone sleeps. It wakes when the clock exceeds 1 MHz. Actual operation is possible with the clock between 1 MHz and 3.25 MHz.
PDM Signal processing overview
- The PDM signal is one bit per sample at 1 MHz to 3.25 MHz.
On the wire, there may be a second independent audio stream with correlated samples. This is usually used for a stereo pair.
The useful audio is mixed with high frequency noise.
A low-pass filter is sufficient to separate the audio from the noise.
Useful high quality PCM audio could be 16 bits per sample at 48 kHz. CD quality is stereo 16 bit PCM at 44.1 kHz. Some professional mixing equipment works at even higher bit depth and sample rate for greater headroom in processing.
Useful telephony quality PCM audio could be 12 bits per sample at 8 kHz. In this case, the audio is usually further transformed to an 8 bit non-linear amplitude coding, either uLaw or aLaw depending on the system requirements.
The typical decimation factor for PDM audio is 64, and is usually a power of two. Decimating PDM at 2.8224 MHz by 64 gives 44.1 kHz for CD audio. Decimating PDM at 3.072 MHz by 64 gives 48 kHz. Decimating PDM at 1.024 MHz by 128 gives 8 kHz for classic telephony.
Filters and sample rate conversion
The raw PDM signal has traded bit depth for sample rate. That trade makes for very efficient delivery of stereo audio in a digital form. But the high sample rates imply a lot of computation is required per second of finished audio.
The signal is also at an extremely low bit depth: effectively carrying only the sign bit of the signal in each sample. Useful signal processing must be done with more bits of precision.
Luckily, one natural consequence of low pass filtering is to increase the bit depth. As a simple rule of thumb, each halving of the sample rate can produce an additional bit of precision. So the solution to the one problem is also the solution to the other. A low pass filter can be used to increase the precision of the samples, and at the same time the same low pass filter makes it safe to decimate the sample rate. Even better, by careful selection of the filter architecture, it can be possible to reduce the total amount of computation required by not performing any computations that would be ignored by the decimation.
In principle, any stable low pass filter with a sharp enough roll off may be used for down conversion. The selection of the specific filter architecture and ratio of input to output sample rates is then a game of engineering trade-offs. Fortunately, this is a well-trod path, and many of the choices can be made by following examples.
A Finite Impulse Response filter is a class of digital filter where each output value is a computation based only on a finite number of past input values.
The impulse response of a filter is one way to characterize the filter’s performance. An impulse is an isolated narrow pulse. The impulse response of an FIR is finite in the sense that the filter’s output responds to the impulse for a finite length of time.
FIR filters are well suited to digital implementations, as the calculation required amounts to a series of multiplications and additions. They are also relatively easy to design with, as their coefficients have a simple relationship to the desired frequency response, allowing the one to be computed from the other.
Decimation for sample rate conversion
Decimation is simply the act of selecting 1 of each M samples. In its simplest form, this reduces the sample rate by a factor of M.
As long as the signal has no frequencies above the half the output sample rate, decimation is a lossless operation. If there are frequencies above half the output sample rate present, however, they will be aliased into the output causing error and distortion.
The solution is then to combine decimation with a low-pass filter, where the low-pass filter has a sufficiently low response in the stop-band so as to produce an acceptable level of distortion from aliasing.
An FIR (finite impulse response) filter is the best choice for decimation. The key advantage of an FIR filter is that each output depends only on its input. (If there were a feedback path, then it would not have a finite impulse response.) This means that when decimation is part of the system design, then the entire filter value need only be calculated for each output.
Concretely, if the decimation factor is M, then M-1 filter results are simply ignored, for a savings of (M-1)/M calculations of the complete FIR filter for each actual output sample.
To make the math even easier, it is possible to design an FIR filter where the coefficients are all either 1 or 0 exactly. Obviously then no actual multiplications are required, only sums of those inputs corresponding to a 1. This optimization leads directly to an interesting architecture called the CIC (cascaded-integrator-comb) filter.
The Cascaded-Integrator-Comb filter structure is a form of FIR filter built from elementary computational blocks that can be implemented with only adders and delays. In particular, there are no multiplications involved, unless a small FIR filter kernel is used at the end of the process to re-shape the frequency response.
Anything built from only adders and delays is easy to implement in hardware, and should translate neatly to any processor with integer arithmetic. It is also fairly straightforward to understand.
Moving Average, cascaded
In its simplest form, a CIC is just a boxcar or moving average filter. An integrator keeps a running sum, adding in each new sample. A comb filter subtracts off the oldest sample. The final component is the decimator that discards M-1 out of each M output samples.
The simple moving average is a relatively soft low pass filter. But it can be improved by passing its output through a second filter, or even a third or more. This cascade of integrator and comb stages is what gives it its name.
The computational strength comes from combining it with a rate change, and through sharpening by cascading multiple copies of the filter. After a little algebra, you end up with an efficient implementation involving a cascade of integrator elements working at the high sample rate, decimation to the lower rate, and a cascade of comb elements working at the low sample rate.
Doing part of the work at the low rate reduces the total amount of computation involved. And putting the combs on the low rate side allows them to work with less total storage (and the overheads of managing it in software) as well.
Back to our microphone
We have one bit samples at a high data rate. We want 16 bit (or more) samples at a low data rate. So we need to low pass filter and decimate, and we want to do as little arithmetic as possible at the high data rate.
The CIC filter is then a natural fit. It decimates well by powers of two. It supports growth of bit depth (additional precision) at each stage. Decimating by the length of the boxcar effectively means that the only working storage is the various accumulators, and no delay lines are required making it even easier to implement in software.
To realize this in hardware then, we need some means of generating the audio sampling clock, sampling the data bits, and feeding them into the CIC process.
The input stage can be implemented with an event counter or a shift register, clocked at the master audio clock rate. Either way, this leverages commonly available peripheral modules for the first integrator stage, and synchronous interrupt handlers can implement the decimation by feeding the integration results on to the comb filters.
Clock generation can be implemented in most embedded CPU chips by dividing from the CPU core clock.
The result is a series of PCM audio samples available for further processing.
And what comes next?
Using this implementation outline, we will demonstrate that a single PDM microphone input can be read and processed even in as small a CPU as the Atmel ATTiny85.
Using only the microphone and the ATTiny85 (and a few passive components) we will implement a sound level meter.
There’s also more to say about the math behind and implementation of the sample rate change filter that reconstructs the PCM audio.
A fair number of words have been added to this thread of articles. You can find a handy collection of all of them at the PDM Microphone category page. We’ve demonstrated the signal processing chain in breadboards based on the ATTiny85, LPC810, LPC812, and in our LPC812 with microphone future product, which we call the SPLear™.
If you have a project involving embedded systems, micro-controllers, electronics design, audio, video, or more we can help. Check out our main site and call or email us with your needs. No project is too small!
+1 626 303-1602
Cheshire Engineering Corp.
120 W Olive Ave
Monrovia, CA 91016
(Written with StackEdit.)