MLow: Meta’s low bitrate audio codec

MLow: Meta’s low bitrate audio codec
MLow: Meta’s low bitrate audio codec
  • At Meta, we help real-time communication (RTC) for billions of individuals by means of our apps, together with WhatsApp, Instagram, and Messenger. 
  • We’re working to make RTC accessible by offering a high-quality expertise for everybody – even those that won’t have the quickest connections or the newest telephones.
  • As an increasing number of individuals have relied on our merchandise to make calls through the years, we’ve been engaged on new methods to make sure all calls have a stable audio high quality.
  • We’ve constructed the Meta Low Bitrate (MLow) codec: a brand new device that improves audio high quality particularly for these on slow-speed connections.
Determine 1: Rising complexity or bitrate normally improves high quality, however good codecs obtain increased high quality whereas balancing the opposite two.

RTC merchandise use many constructing blocks to ship the complete expertise, and one of many important parts is audio/video codecs. These codecs assist compress the captured audio/video knowledge so it may be despatched throughout the web effectively to the recipient, protecting the expertise actual time. For instance, the dimensions of uncooked audio captured for a typical name is 768 kbps (mono, sampling at 48kHz, bit depth 16), which fashionable codecs are in a position to compress right down to 25-30 kbps. Typically this compression comes at the price of some high quality (lack of data), however good codecs can strike a steadiness among the many trio of high quality, bitrate, and complexity by exploiting deep data in regards to the nature of the audio sign in addition to through the use of psychoacoustics. 

Constructing a superb codec is sort of difficult, and that’s the reason we don’t see new codecs rising fairly often. The final extensively recognized, good open-source codec was Opus, launched in 2012, which has change into the codec of alternative for the wide range of purposes on the web. Meta has used Opus for all its RTC wants, and to date it has served us properly – serving to to ship high quality calls to billions of customers throughout the globe. 

Our motivation for constructing a brand new codec

Given the large scale of RTC utilization in Meta merchandise, we get to see how a codec performs in a spread of community eventualities and the way it impacts the top person’s expertise. Particularly, we’ve noticed {that a} vital chunk of calls have poor community connections all through or for a part of a name. Usually a bandwidth estimation module (BWE) detects the standard of the community, and because the community high quality degrades, we have to decrease the codec working bitrate to keep away from congesting the community and maintain the audio flowing – impacting the trio steadiness referenced above. Complicating issues, conducting a video name regardless of poor community high quality leaves little room for audio and pushes the audio bitrate additional down. The bottom working level for Opus is 6 kbps, at which it runs in NarrowBand mode (0 – 4kHz) and doesn’t adequately seize all of the sound frequencies produced by human voices—and so doesn’t sound as clear or pure. Right here is an instance of how Opus sounds at 6kbps and the corresponding reference file for comparability.

Uncooked reference sign: 

Opus @ 6 kbps NarrowBand (NB): 

Over the past two years, we’ve got seen growth of some new machine studying (ML)-based audio codecs that present good high quality audio at very low bitrates. In October of 2022, Meta launched Encodec, which achieves amazingly crisp audio high quality at very low bitrates. Whereas these AI/ML-based codecs are in a position to obtain nice high quality at low bitrates, it typically comes on the expense of heavy computational price. Consequently, solely the very high-end (costly) cellular handsets are in a position to run these codecs reliably, whereas customers working on lower-end gadgets proceed to expertise audio high quality points in low-bitrate situations. So the web impression of those newer computationally costly codecs is definitely restricted to a small portion of customers.

A major variety of our customers nonetheless use low-end gadgets. For instance, greater than 20 p.c of our calls are made on ARMv7 gadgets, and 10’s of hundreds of thousands of day by day calls on WhatsApp are on 10-year-old-plus gadgets. Given the available codec decisions and our dedication to make sure that all customers – no matter what machine they’re on – have a top quality calling expertise, we clearly want a codec with very low-compute necessities that also delivers high-quality audio at these lowest bitrates.

The MLow codec

We broke floor with our growth of a brand new codec in late 2021. After almost two years of lively growth and testing, we’re proud to announce Meta Low Bitrate audio codec, aka MLow, which achieves two-times-better high quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Much more importantly, we’re in a position to obtain this nice high quality whereas protecting MLow’s computational complexity 10 p.c decrease than that of Opus. 

Determine 2 under reveals a MOS (Imply Opinion Rating) plot on a 1-5 scale and compares the POLQA scores between Opus and MLow at varied bitrates. Because the chart makes evident, MLow has an enormous benefit over Opus on the lowest bitrates, the place it saturates high quality sooner than Opus.

Determine 2: POLQA rating evaluating Opus (WB) versus MLow at varied bitrates throughout a big dataset of recordsdata.

We now have already totally launched MLow to all Instagram and Messenger calls and are actively rolling it out on WhatsApp—and we’ve already seen unimaginable enchancment in person engagement pushed by higher audio high quality.

Listed below are some audio samples so that you can hearken to. We recommend that you just use your favourite pair of headphones to understand the placing audio-quality variations.

Opus 6 kbps NB MLow 6 kbps WB Reference

Having the ability to encode high-quality audio at decrease bitrates additionally unlocks more practical Ahead Error Correction (FEC) methods. In contrast with Opus, with MLow we will afford to pack FEC at a lot decrease bitrates, which considerably helps to enhance the audio high quality in packet loss eventualities. 

Listed below are two audio samples at 14 kbps with heavy 30 p.c receiver-side packet loss.

Opus:

Word that at these bitrates, Opus is just not in a position to encode any inband FEC. It wants a minimal of 19 kbps to encode any inband FEC at 10 p.c packet loss, which hurts the audio restoration.

MLow internals

MLow builds on the ideas of a basic CELP (Code Excited Linear Prediction) codec with developments round excitation technology, parameter quantization, and coding schemes. Determine 3 is a high-level visible of how the codec works internally. On the left we’ve got an enter sign (uncooked PCM audio) feeding into the encoder, which then splits the sign into two low and high-frequency bands. Then, every band is encoded individually whereas making use of shared data to attain higher compression. All of the output is handed by means of a spread encoder to additional compress and generate an encoded payload. The decoder does the precise reverse when given the payload to generate output audio alerts.

Determine 3: Excessive degree MLow encoder and decoder structure.

With these split-band optimizations, we’re in a position to encode the excessive band utilizing only a few bits, which lets MLow ship SuperWideBand (32kHz sampling) utilizing a a lot decrease bitrate.

What’s subsequent?

MLow has tremendously enhanced audio high quality on low-end gadgets whereas nonetheless guaranteeing calls are end-to-end encrypted. We’re actually enthusiastic about what we’ve got achieved in simply the final two years—from growing a brand new codec to efficiently delivery it to billions of customers across the globe. We’re persevering with to work on enhancing the audio restoration in heavy packet loss networks by pumping out extra redundant audio, which MLow permits us to do effectively. We’re excited to share extra as we proceed working to make it simpler for all our customers to make high quality audio calls.