, , , , , ,

This post describes improvements in the dynamic range of PCM samples recovered from the PDM microphone in the SPLear™ sound level measurement module.

This is part of our series of articles on the general subject of audio signal processing from air to information.

To date, the conversion of the 1-bit per sample PDM audio bitstream from the microphone on the SPLear has used the filter structure designed for the stunt of capturing audio in an 8-bit AVR ATTiny85 CPU. The SPLear module has a NXP LPC812 instead, which is a much more powerful 32-bit ARM Cortex-M0 that can also run at a higher CPU clock rate.

It is time we revisit the conversion to see if we can squeeze out more information from the audio stream.

Also, we have been clocking the microphone at about 1 MHz. But this particular microphone has its performance documented in three bands near 750 kHz, 1.75 MHz, and 3.75 MHz. The low clock rate provides for lower power operation at a small cost to audio quality. The high clock rates provide for full bandwidth audio capture.

But first, we should verify the CPU clock configuration to make sure we are clocking the CPU in the best possible way. The LPC812 eval board where this code was first run has an external 12 MHz crystal, and a clock setup that simply multiplied that by two to get a 24 MHz core clock. Since the CPU is capable of running as fast as 30 MHz, we should start by checking the startup code to turn the CPU clock all the way up.

Much of the firmware and code running in the RPi has been discussed and published in past posts on this blog, and the source code is available from a public repository. The firmware and Lua program used here is checked in as [8a4059fa2f].

CPU Clocking

The full speed CPU clock can be derived from an internal RC oscillator, an external crystal, or an external system clock, and optionally use a PLL to change the clock rate by somewhat arbitrary ratios. (There is also a choice of low speed clocks used for the watchdog reset timer that can also be selected for lower power consumption, but it isn’t useful to us right now.)

Since we don’t want to spare any pins for external clock components, we want to use the internal oscillator. This does come at a slight cost in the precision of the clock frequency, but that is an acceptable cost for this project.

I wrote “slight cost” above. The internal oscillator is trimmed at the factory to better than 1.5% (or 15000 ppm) accuracy. That is more than accurate enough for async serial baud rates, and within the tolerance for I2C. An ordinary external crystal or external crystal oscillator module would typically be stable and accurate to 50 ppm or better, and a temperature controlled module can be much better than that.

(As a reference point, the most accurate oscillator modules currently in stock at Digikey are the Abracon AOCJY6 Series. The most accurate available today is 0.1 ppb, currently selling for \$1798 quantity 1. If half the accuracy will do, the 0.2 ppb module is a bargain at \$1222. For contrast, a typical 20 ppm crystal can be had for under \$0.50.)

In our firmware, the clock was already configured to use the PLL to scale the 12 MHz internal oscillator to 30 MHz, by the line in sysinit.c in the function PDMSPL_SetupIrcClocking() that reads

    /* Configure the PLL M and P dividers */
    Chip_Clock_SetupSystemPLL(4, 1);

This sets the MSEL and PSEL bits in the SYSPLLCTRL register to make the PLL divisors M be 5 and P be 2, so that the high frequency FCCO clock is Fin * M * 2 * P, and in the legal operating range of 156 MHz to 320 MHz while the output frequency is Fin * M and under 100 MHz. As configured, for Fin = 12 MHz, Fout = 60 MHz and the separate system clock divider is used to make the core clock be half of Fout or 30 MHz:

    /* Set system clock divider to 2 */

The bottom line is that this review shows that the SPLear is already configured to run the CPU core at 30 MHz.

PDM Clock Rate

The PDM microphone clock is provided by the SPI peripheral’s master clock output, and generated by using the SPI in master mode to transmit 16-bit all zero words as fast as the SPI device can send them at a configured clock rate. The available clock rates are integer divisions of the CPU core clock, so from 30 MHz our choices are limited to:

divisor   clock       Mic Working Range
   6      5.000 MHz   too high
   7      4.286 MHz   standard: 3.072 to 4.8 MHz
   8      3.750 MHz   standard: 3.072 to 4.8 MHz
   9      3.333 MHz   standard: 3.072 to 4.8 MHz
  10      3.000 MHz   undocumented
  11      2.727 MHz   undocumented
  12      2.500 MHz   undocumented
  13      2.308 MHz   standard: 1.024 to 2.475 MHz
  14      2.142 MHz   standard: 1.024 to 2.475 MHz
  29      1.034 MHz   standard: 1.024 to 2.475 MHz
  30      1.000 MHz   undocumented
  37      811 kHz     undocumented
  38      789 kHz     low power: 351 to 800 kHz
  58      517 kHz     low power: 351 to 800 kHz
  59      508 kHz     low power: 351 to 800 kHz
  85      353 kHz     low power: 351 to 800 kHz
  86      349 kHz     undocumented

The ATTiny85 (and our first port to the LPC810) firmware had been generating PCM samples at 7812.5 Hz by decimating 1 MHz PDM by 128. Since 1 MHz is outside of the documented range of the mic, we wanted to bump that to 1.024 MHz which decimated by 128 would produce the classic telephony sample rate of 8 kHz. From the table above, 1.024 MHz cannot be made, but 1.034 MHz can, and that yields a PCM sample rate of 8078 kHz.

These trade offs result from the hardware restriction that division of a clock by anything other than an integer is much more difficult, and that the CIC filter requires that the decimation factor be an exact power of two. Here are all the common audio sample rates, showing the required PDM clock rate for each of three decimation factors. All but three of these combinations fall at a PDM clock rate with documented performance for the PDM microphone in the SPLear. (This table likely explains the choice of documented clock rates, but doesn’t explain why there is a gap between 2.475 and 3.072 in the spec sheet.)

PCM rate    x32         x64         x128
----------  ----------  ----------  ----------
8 kHz       256 kHz*    512 kHz     1.024 MHz
11.025 kHz  352.8 kHz   705.6 kHz   1.4112 MHz
16 kHz      512 kHz     1.024 MHz   2.048 MHz
22.05 kHz   705.6 kHz   1.4112 MHz  2.8224 MHz
32 kHz      1.024 MHz   2.048 MHz   4.096 MHz
44.1 kHz    1.4112 MHz  2.8224 MHz  5.6448 MHz*
48.0 kHz    1.536 MHz   3.072 MHz   6.144 MHz*
(*) PDM clock rate not supported by the microphone.

Finding clock divisors from the system clock to best achieve each of those PDM clock rates is left as an exercise, but it might be interesting to know which common audio sample rates are within 1.5% of an available divisor.

A further complication is the computation required by the CIC filter itself. With a CPU core clock of 30 MHz, and a bit rate of 1 MHz, there are only an average of 30 clocks per bit available to do the filter math. Moving to a higher PDM clock rate will make this a tight bottleneck.

So that is an incentive to drop the PDM sample rate. Dropping to 500 Hz is convenient, as it is an exact divisor of 30 MHz. If the CIC filter is changed from decimating by 128 to decimating by 64, then the resulting PCM sample rate will be the familiar 7812.5 Hz.

Changing the PDM clock rate is a simple edit to one line of pdmspl.c in the function pdmspi_spiconfig():

    /* 0.5 MHz SPI clock for microphone */
    Chip_SPIM_SetClockRate(pSPI, 500000);

It had read 1024000, and now reads 500000.

The firmware will run with this change even without changing the CIC filters, and you can probe the mic’s clock wire to confirm the clock rate achieved. Of course, if you don’t tweak the filters, the sample rate will be 3906.25 and the audio bandwidth similarly narrowed. The SPL computation will work, but it is now measuring lower frequencies than before.

Internally, the function Chip_SPIM_SetClockRate() computes the divisor by integer division of the core clock by the target rate. The result truncates, and is clipped at the limits of the SPI clock divisor hardware. Due to the truncation, the result will be a clock frequency that is greater than or equal to the rate requested.

CIC Structure and Bit depth

The existing low pass filter used to convert from PDM to PCM has two parts.

The first part is an 8 point boxcar average, paired with a decimation by 8. The effect is that each 8 bits of input are simply averaged. The SPI peripheral implements most of the work by collecting bits in a register 16 at a time, which can be used to index a table with 256 entries twice, once from each half of the received 16 bits. This is exactly a CIC filter with R=8, M=1, and N=1 in the common nomenclature, and extends the sample depth from 1 bit to 4 bits.

The second part does a further decimation by 16 in a more elaborate CIC filter, with R=16, M=2, and N=2. That extends the sample depth to a total of 14 bits.

After some experimentation with computation performance of other architectures, we determined that even a 30 MHz CPU clock is not quite enough to safely handle the decimation in a single CIC filter that operates on each PDM bit. The consequence of running out of computation time is missed interrupts, and a non-responsive device.

But it is easy to add stages to the second CIC filter and get additional bit depth. The simple change to use N=3 makes the total bit depth 16, for instance, and at the cost of just one more add into one more accumulator at the higher sample rate. And of course, the matching costs on at the low sample rate, but that additional work has almost no impact on the total available run time.

The bit depth of 16 does mean that we need to make the accumulators larger than 16 bits. And there is incentive to do so: the 32-bit ARM core is better at handling 32-bit arithmetic, and the actual performance of this code improved by about 10% by changing from 16 bit to 32 bit accumulators.

First, we change R to 8 from 16 and widen CICREG:

#define CIC2_R 8
typedef int32_t CICREG;

Then we add the new accumulator, and its matching comb filter slots:

CICREG s2_sum3 = 0;
CICREG s2_comb3_1 = 0;
CICREG s2_comb3_2 = 0;

And the integrator needs to get its new stage by adding the two lines referencing s2_sum3:

            // Now feed the 4 bit result to a second CIC with N=3, R=8, M=2
            // which has bit growth of 12, for a total of 16 significant bits
            // out. The counter scount is used to implement the decimation.
            s2_sum1 += pdmsum8[pdm&0xff] ;
            s2_sum2 += s2_sum1;
            s2_sum3 += s2_sum2;
            s2_sum1 += pdmsum8[pdm>>8] ;
            s2_sum2 += s2_sum1;
            s2_sum3 += s2_sum2;

Finally, the comb needs to complete the integration of the third stage after the decimation:

                stage3 = stage2 - s2_comb3_2;
                s2_comb3_2 = s2_comb3_1;
                s2_comb3_1 = stage2;

                // queue the finished PCM sample
                RingBuffer_Insert(&pcmring, &stage3);

There are other minor changes that relate, notably the internal scaled SPL figure has a new peak value since it is implicitly related to the maximum absolute sample value, and that has changed from 8192 to 65536.

Another related change is that the uLaw recorder mode needs to scale the PCM samples for uLaw conversion, since uLaw is defined to effectively only pass a 14 bit dynamic range.

What Now?

The next step is to continue to find interesting things to do with the measured SPL level, which will certainly be the subject of future posts. Watch this space!

Please let us know what you think about the SPLear!

If you have a project involving embedded systems, micro-controllers, electronics design, audio, video, or more we can help. Check out our main site and call or email us with your needs. No project is too small!

+1 626 303-1602
Cheshire Engineering Corp.
120 W Olive Ave
Monrovia, CA 91016

(Written with StackEdit.)