Tags

, , , , , , , , ,

This post continues experimenting with recording audio converted from PDM to PCM in an LPC812 CPU. The last installment passed 8-bit PCM samples over the UART to a PC for recording. Loosing 6 bits of precision was required by the limitations on the serial bandwidth available (the PC UART is limited to 115200 baud), but it cost 36 dB of dynamic range in the recorded audio. This post recovers the dynamic range lost by switching from 8-bit PCM to a non-linear coding widely used for telephony applications and directly supported by audio editing tools.

This is part of a series of articles on the general subject of audio signal processing from air to information. See the section at the end for the full series.

The problem with bits

The microphone output is 1 bit per sample using Pulse Density Modulation (PDM) at a rather high sample rate. We are operating this microphone at 1 MHz, but PDM is often sampled as high as 3.072 MHz for high quality applications.

We use a series of signal processing filters as described in earlier posts to trade the high sample rate for higher resolution samples at the much lower sample rate of 7812.5 Hz. These are 14-bit Pulse Code Modulation (PCM) samples.

As noted in the previous post, we can’t send all 14 bits per sample up the UART. So we compromised by limiting ourselves to 8-bit unsigned PCM.

But an 8-bit sample has at most 48 dB of dynamic range, and even as a demonstration of recording it would be nice to preserve the original 84 dB dynamic range of the 14-bit PCM samples.

At first glance, this seems intractable. Fourteen bits just don’t fit into an opening eight bits wide. But there is a way out.

Companding to the rescue

Analog systems designers have long used a technique called companding to pass signals that need a wide dynamic range through a channel that only supports a smaller range.

The trick is to compress the signal for transmission, and expand it on reception. If these were simple linear scale operations then information would be lost in quiet passages and there would be no advantages at all. However, if a logarithm is used for compression, then the trick bears fruit.

As typically implemented in analog systems, it also has the nice benefit that even if the expanding step is left out, the compressed dynamic range audio still sounds acceptable. In some applications (notably voice on a telephone wire) compression without much (or any) expansion is actually preferred.

Naturally, as telephone systems became digital, the companding was preserved as a feature, although there are two competing standards in common use. A-Law is used in many countries, and µ-Law in the US. Of the two, µ-Law preserves the larger dynamic range. Both convert a signed PCM sample into an 8-bit code byte for transmission on the wire. Both are widely understood by audio processing tools, such as Audacity.

Coincidentally, the dynamic range of µ-Law is identical to 14-bit PCM. So it makes sense to implement µ-Law for better quality recordings.

Implementation

The code implementing recording in µ-Law was checked in as [1d38441bbc]. See the file mulaw.c for the gory details.

Since the encoding is piece-wise linear, with each linear segment falling near log_2(P) boundaries it is practical to encode it from a signed 14-bit linear value by counting leading zero bits in |P|+33. The bias of 33 shifts the encoding range from (0 – 8158) to (33 – 8191). The result can be seen in the following encoding table:

   Biased Linear Input Code        Compressed Code
   ------------------------        ---------------
     00 0000 001w xyza               000wxyz
     00 0000 01wx yzab               001wxyz
     00 0000 1wxy zabc               010wxyz
     00 0001 wxyz abcd               011wxyz
     00 001w xyza bcde               100wxyz
     00 01wx yzab cdef               101wxyz
     00 1wxy zabc defg               110wxyz
     01 wxyz abcd efgh               111wxyz

Each biased linear code has a leading 1 which identifies the segment number. The value of the segment number is equal to 7 minus the number of leading 0s. The quantization interval is directly available as the four bits wxyz. The trailing bits (ah) are ignored.

This formulation is very close to \lfloor 16*log_2(|P|+33) \rfloor which if you’ve been following along is almost identical to the logarithmic scale we use for our computed SPL values.

Most of the implementations of µ-Law found floating around use a table-driven approach to finding the segment number. But that can be simplified by thinking of it in terms of the count leading zeroes operator. The ARM architecture has an instruction that implements that count, which we could use here, and if performance had been an issue I certainly would use it.

A reasonable compromise is to use a binary search to count the zero bits:

static unsigned int clz16(unsigned short x)
{
    int n = 0;

    if (x = 0)
        return 16;
    if (!(x & 0xFF00)) { n += 8; x <<= 8; }
    if (!(x & 0xF000)) { n += 4; x <<= 4; }
    if (!(x & 0xC000)) { n += 2; x <<= 2; }
    if (!(x & 0x8000)) { n += 1; /*x <<= 1;*/ }
    return n;
}

This works by treating a zero input as special, then checking for blocks of 8, 4, 2, and finally 1 zero while shifting off any blocks seen. The result is portable and about as fast as practical with portable code.

The segment number can be written as:

static unsigned int segment14(unsigned short x)
{
    int n = clz16(x);
    return n >= 10 ? 0 : 10-n;
}

Finally, the 8-bit code is produced from a 14-bit signed PCM sample:

unsigned char linear14_ulaw(int16_t pcm_val)
{
    int16_t                mask;
    int16_t                seg;
    unsigned char        uval;

    /* u-law inverts all bits */
    /* Get the sign and the magnitude of the value. */
    if (pcm_val < 0) {
        pcm_val = -pcm_val;
        mask = 0x7F;
    } else {
        mask = 0xFF;
    }
    if ( pcm_val > CLIP ) pcm_val = CLIP;
    pcm_val += (BIAS >> 2);

    /* Convert the scaled magnitude to segment number. */
    seg = segment14(pcm_val);

    /*
     * Combine the sign, segment, quantization bits;
     * and complement the code word.
     */
    if (seg >= 8) /* out of range, return maximum value. */
        return (unsigned char) (0x7F ^ mask);
    uval = (unsigned char) (seg << 4) | ((pcm_val >> (seg + 1)) & 0xF);
    return (uval ^ mask);
}

Finally, the code in main() that previously scaled and clipped the PCM sample to 8 bits, is replaced by a simple call to linear14_ulaw() and the rest of the transfer of the audio samples is as before.

Demonstrations

Here is a short recording of the bike horn a couple of feet from the mic. This was made with the previous version that emitted 8-bit unsigned PCM samples on the UART.

Here is a another short recording of the bike horn at a couple of feet from the mic, then close, then far. This was made with the firmware described in this post that emits µ-Law encoded samples on the UART.

Tools

RealTerm

RealTerm is a curious little tool that I discovered recently. Like TeraTerm (which I strongly recommend for general use as a terminal emulator) it allows interaction with devices over a serial port, and logging that interaction.

The key trick it brings to the table is that it is intended for use with devices that are not normally used interactively. So it has a number of features to allow you to send fragments of arbitrary binary control protocols, and to display the received bytes even if they are full of interesting control character sequences without becoming confused.

I used version 2.0.0.70 for this task. A new version 3 is currently under development.

Audacity

I used Audacity to interpret, analyze, and output the audio files in formats useful by other tools. It is a hugely powerful and extensible audio editor, but I only need the basic features.

It easily supports importing tracks from the files written by capturing the serial data with either TeraTerm or RealTerm. Its raw data importer has filters for both 8-bit unsigned PCM and µ-Law among many other formats. Importantly, it allows the sample rate to be specified as 7812.5 Hz.

When outputting MP3, the export filter notices the unusual sample rate and requests that the audio be resampled to a rate supported by the MPEG standard. I chose to upsample to 24 kHz.

ffmpeg

YouTube cannot handle a pure audio file, and I haven’t found any better place to drop simple sound bites like these examples. So rather than fuss with evaluating a new file sharing service, hosting the MP3 files myself, or upgrading my WordPress hosting to get MP3 media rights, I simply drew a title card to use as the “video” to go with the audio recordings.

The immensely powerful open source video tool ffmpeg was able to take the MP3 audio file and mix it with a video sequence created from the single PNG format title card, to produce the MP4 video files that I uploaded to YouTube.

My command line was something like the following:

C:...&gt;ffmpeg -t 8 -i 8-bit-6.mp3 -s 640x480 -r 24 -loop 1 -i honk-pcm8-title.png  8-bit-6.mp4

This produces at most 8 seconds total SD-resolution video by reading 8-bit-6.mp3, and mixing it with a single title card read from honk-pcm8-title.png. The -r 24 -loop 1 causes it to repeat the same image at 24 frames per second. The result was a video that YouTube liked.

Context

Previous installments include:

All of the code supporting this demonstration is in a public fossil repository. This post documents work that was checked in as [1d38441bbc].


(Written with StackEdit.)