G.711

G.711 is an ITU-T standard for audio companding. It is primarily used in telephony. The standard was released for usage in 1972. Its formal name is Pulse code modulation (PCM) of voice frequencies. It is a required standard in many technologies, for example in H.320 and H.323 specifications. It can also be used for fax communication over IP networks (as defined in T.38 specification). G.711, also known as Pulse Code Modulation (PCM), is a very commonly used waveform codec. G.711 is a narrowband audio codec that provides toll-quality audio at 64 kbit/s. G.711 passes audio signals in the range of 300–3400 Hz and samples them at the rate of 8,000 samples per second, with the tolerance on that rate of 50 parts per million (ppm). Non-uniform (logarithmic) quantization with 8 bits is used to represent each sample, resulting in a 64 kbit/s bit rate. There are two slightly different versions: μ-law, which is used primarily in North America, and A-law, which is in use in most other countries outside North America.

Two enhancements to G.711 have been published: G.711.0 utilizes lossless data compression to reduce the bandwidth usage and G.711.1 increases audio quality by increasing bandwidth.

Features

Sampling frequency 8 kHz
64 kbit/s bitrate (8 kHz sampling frequency × 8 bits per sample)
Typical algorithmic delay is 0.125 ms, with no look-ahead delay
G.711 is a waveform speech coder
G.711 Appendix I defines a Packet Loss Concealment (PLC) algorithm to help hide transmission losses in a packetized network
G.711 Appendix II defines a Discontinuous Transmission (DTX) algorithm which uses Voice Activity Detection (VAD) and Comfort Noise Generation (CNG) to reduce bandwidth usage during silence periods
PSQM testing under ideal conditions yields Mean Opinion Scores of 4.45 for G.711 μ-law, 4.45 for G.711 A-law
PSQM testing under network stress yields Mean Opinion Scores of 4.13 for G.711 μ-law, 4.11 for G.711 A-law

Types

G.711 defines two main companding algorithms, the µ-law algorithm and A-law algorithm. Both are logarithmic, but A-law was specifically designed to be simpler for a computer to process. The standard also defines a sequence of repeating code values which defines the power level of 0 dB.

The µ-law and A-law algorithms encode 14-bit and 13-bit signed linear PCM samples (respectively) to logarithmic 8-bit samples. Thus, the G.711 encoder will create a 64 kbit/s bitstream for a signal sampled at 8 kHz.^[1]

G.711 μ-law tends to give more resolution to higher range signals while G.711 A-law provides more quantization levels at lower signal levels.

A-Law

Main article: A-law algorithm

A-law encoding thus takes a 13-bit signed linear audio sample as input and converts it to an 8 bit value as follows:

Linear input code ^{[note 1]}	Compressed code XOR 01010101	Linear output code ^{[note 2]}
`s0000000abcdx`	`s000abcd`	`s0000000abcd1`
`s0000001abcdx`	`s001abcd`	`s0000001abcd1`
`s000001abcdxx`	`s010abcd`	`s000001abcd10`
`s00001abcdxxx`	`s011abcd`	`s00001abcd100`
`s0001abcdxxxx`	`s100abcd`	`s0001abcd1000`
`s001abcdxxxxx`	`s101abcd`	`s001abcd10000`
`s01abcdxxxxxx`	`s110abcd`	`s01abcd100000`
`s1abcdxxxxxxx`	`s111abcd`	`s1abcd1000000`

↑ This value is produced by taking the two's complement representation of the input value, and inverting all bits after the sign bit if the value is negative.
↑ Signed magnitude representation

Where s is the sign bit, s is its inverse (i.e. positive values are encoded with MSB = s = 1), and bits marked x are discarded. Note that the first column of the table uses different representation of negative values than the third column. So for example, input decimal value −21 is represented in binary after bit inversion as 1000000010100, which maps to 00001010 (according to the first row of the table). When decoding, this maps back to 1000000010101, which is interpreted as output value −21 in decimal. Input value +52 (0000000110100 in binary) maps to 10011010 (according to the second row), which maps back to 0000000110101 (+53 in decimal).

This can be seen as a floating point number with 4 bits of mantissa m, 3 bits of exponent e and 1 sign bit s formatted as seeemmmm with the decoded linear value y given by formula

y=(-1)^{s}\cdot (16\cdot \min\{e,1\}+m+0.5)\cdot 2^{\max\{e,1\}},

which is a 13-bit signed integer in the range ±1 to ±(2¹² − 2⁶). Note that no compressed code decodes to zero due to the addition of 0.5 (half of a quantization step).

In addition, the standard specifies that all resulting even bits (LSB is even) are inverted before the octet is transmitted. This is to provide plenty of 0/1 transitions to facilitate the clock recovery process in the PCM receivers. Thus, a silent A-law encoded PCM channel has the 8 bit samples coded 0xD5 instead of 0x80 in the octets.

When data is sent over E0 (G.703), MSB (sign) is sent first and LSB is sent last.

ITU-T STL^[2] defines the algorithm for decoding as follows (it puts the decoded values in the 13 most significant bits of the 16-bit output data type).

void            alaw_expand(lseg, logbuf, linbuf)
  long            lseg;
  short          *linbuf;
  short          *logbuf;
{
  short           ix, mant, iexp;
  long            n;

  for (n = 0; n < lseg; n++)
  {
    ix = logbuf[n] ^ (0x0055);	/* re-toggle toggled bits */

    ix &= (0x007F);		/* remove sign bit */
    iexp = ix >> 4;		/* extract exponent */
    mant = ix & (0x000F);	/* now get mantissa */
    if (iexp > 0)
      mant = mant + 16;		/* add leading '1', if exponent > 0 */

    mant = (mant << 4) + (0x0008);	/* now mantissa left justified and */
    /* 1/2 quantization step added */
    if (iexp > 1)		/* now left shift according exponent */
      mant = mant << (iexp - 1);

    linbuf[n] = logbuf[n] > 127	/* invert, if negative sample */
      ? mant
      : -mant;
  }
}

See also "ITU-T Software Tool Library 2009 User's manual" that can be found at.^[3]

μ-Law

μ-law (sometimes referred to as ulaw, G.711Mu, or G.711μ) encoding takes a 14-bit signed linear audio sample in two's complement representation as input, inverts all bits after the sign bit if the value is negative, adds 33 (binary 100001) and converts it to an 8 bit value as follows:

Linear input value ^{[note 1]}	Compressed code XOR 11111111	Linear output value ^{[note 2]}
`s00000001abcdx`	`s000abcd`	`s00000001abcd1`
`s0000001abcdxx`	`s001abcd`	`s0000001abcd10`
`s000001abcdxxx`	`s010abcd`	`s000001abcd100`
`s00001abcdxxxx`	`s011abcd`	`s00001abcd1000`
`s0001abcdxxxxx`	`s100abcd`	`s0001abcd10000`
`s001abcdxxxxxx`	`s101abcd`	`s001abcd100000`
`s01abcdxxxxxxx`	`s110abcd`	`s01abcd1000000`
`s1abcdxxxxxxxx`	`s111abcd`	`s1abcd10000000`

↑ This value is produced by taking the two's complement representation of the input value, inverting all bits after the sign bit if the value is negative, and adding 33.
↑ Signed magnitude representation. Final result is produced by decreasing the magnitude of this value by 33.

Where s is the sign bit, and bits marked x are discarded.

In addition, the standard specifies that all result bits are inverted before the octet is transmitted. Thus, a silent μ-law encoded PCM channel has the 8 bit samples coded 0xFF instead of 0x00 in the octets.

Adding 33 is necessary so that all values fall into a compression group and it is subtracted back when decoding. This addition means that an overflow would occur for values outside the ±8159 range, so such values are clipped during encoding to avoid it.

Breaking the encoded value formatted as seeemmmm into 4 bits of mantissa m, 3 bits of exponent e and 1 sign bit s, the decoded linear value y is given by formula

y=(-1)^{s}\cdot [(16.5+m)\cdot 2^{e+1}-33],

which is a 14-bit signed integer in the range ±0 to ±8031.

Note that 0 is encoded as 0xFF, and −1 is encoded as 0x7F, but when decoded back the result is 0 in both cases.

G.711.0

G.711.0, also known as G.711 LLC, utilizes lossless data compression to reduce the bandwidth usage by as much as 50 percent.^[4] The Lossless compression of G.711 pulse code modulation standard was approved by ITU-T in September 2009.^[5]^[6]

G.711.1

G.711.1 is an extension to G.711, published as ITU-T Recommendation G.711.1 in March 2008. Its formal name is Wideband embedded extension for G.711 pulse code modulation.^[6]^[7]^[8]

G.711.1, allows the addition of narrowband and/or wideband (16000 samples/s) enhancements, each at 25% of the bitrate of the (included) base G.711 bitstream, leading to data rates of 64, 80 or 96 kbit/s.

G.711.1 is compatible with G.711 at 64 kbit/s,^[9] hence an efficient deployment in existing G.711-based voice over IP (VoIP) infrastructures is foreseen. The G.711.1 coder can encode signals at 16 kHz with a bandwidth of 50–7000 Hz at 80 and 96 kbit/s, and for 8-kHz sampling the output may produce signals with a bandwidth ranging from 50 up to 4000 Hz, operating at 64 and 80 kbit/s.^[7]

The G.711.1 encoder creates an embedded bitstream structured in three layers corresponding to three available bit rates: 64, 80 and 96 kbit/s. The bitstream does not contain any information on which layers are contained, an implementation would require outband signalling on which layers are available. The three G.711.1 layers are: log companded pulse code modulation (PCM) of the lower band including noise feedback, embedded PCM extension with adaptive bit allocation for enhancing the quality of the base layer in the lower band, and weighted vector quantization coding of the higher band based on modified discrete cosine transformation (MDCT).^[7]

Two extensions for G.711.1 are planned in 2010: superwideband extension (bandwidth to 14000 Hz) and lossless bitstream compression.^[10]

Licensing

Since G.711 was released in 1972 its patents have long since expired, so it is freely available.^[11]

References

↑ G.711 : Pulse code modulation (PCM) of voice frequencies; ITU-T Recommendation (11/1988), Retrieved on 2009-07-08
↑ G.191 : Software tools for speech and audio coding standardization. Function alaw_expand in file Software/stl2009/g711/g711.c. Itu.int. Retrieved on 2013-09-18.
↑ G.191 : ITU-T Software Tool Library 2009 User's manual. Itu.int (2010-07-23). Retrieved on 2013-09-18.
↑ ITU-T (2009-07-17). "ITU-T Newslog - Voice codec gets new lossless compression". Retrieved 2010-02-28.
↑ ITU-T. "G.711.0 : Lossless compression of G.711 pulse code modulation". Retrieved 2010-02-28.
1 2 Recent Audio/Speech Coding Developments in ITU-T and future trends (PDF), August 2008, retrieved 2010-02-28
1 2 3 ITU-T (2008) G.711.1 : Wideband embedded extension for G.711 pulse code modulation Retrieved on 2009-06-19
↑ Hiwasaki; et al. (2008-08-25), G.711.1: a wideband extension to ITU-T G.711 (PDF), retrieved 2015-06-13
↑ Lapierre; et al. (2008-08-25), Noise shaping in an ITU-T G.711-Interoperable embedded codec (PDF), retrieved 2015-06-13
↑ Nokia Research Center (2009-04-06), Coding standards (PDF), retrieved 2010-03-01
↑ "G711 Spec". Retrieved 2011-07-05.

External links

ITU-T Recommendation G.711 - (STD.ITU-T RECMN G.711-ENGL 1989)
ITU-T G.711 page
ITU-T G.191 software tools for speech and audio coding, including G.711 C code
Code Project C# implementation of G.711 with source code
RFC 3551 - RTP Profile for Audio and Video Conferences with Minimal Control - G.711 - PCMA and PCMU definition.
RFC 4856 - Registration of Media Type audio/PCMA and audio/PCMU
RFC 5391 - RTP Payload Format for ITU-T Recommendation G.711.1 (PCMA-WB and PCMU-WB)

Multimedia compression and container formats

Video
compression

ISO/IEC	MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2/ASP Part 10/AVC MPEG-H Part 2/HEVC

ITU-T	H.120 H.261 H.262 H.263 H.264 H.265

SMPTE	VC-1 VC-2 VC-3 VC-5

Others	Apple Video AV1 AVS Bink Cinepak Daala Dirac DV DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 ProRes 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video, Spark Theora Thor VP3 VP6 VP7 VP8 VP9 WMV XEB YULS

Audio
compression

ISO/IEC	MPEG-1 Layer III (MP3) MPEG-1 Layer II Multichannel MPEG-1 Layer I AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio

ITU-T	G.711 (A-law, µ-law) G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1

IETF	Opus iLBC

3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B GSM-HR GSM-FR GSM-EFR

Others	ACELP AC-3 ALAC Asao ATRAC CELT Codec2 DRA DTS FLAC iSAC Monkey's Audio TTA True Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV Speex SVOPC TwinVQ VMR-WB Vorbis VSELP WavPack WMA MQA aptX

Image
compression

IEC, ISO, ITU-T, W3C, IETF	CCITT Group 4 GIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG XR Lossless JPEG PNG TIFF TIFF/EP TIFF/IT

Others	APNG BPG DjVu EXR FLIF ICER MNG PGF QTVR WBMP WebP

Containers

ISO/IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport

ITU-T	H.222.0 T.802

IETF	RTP

Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video GXF IFF M2TS Matroska WebM MXF Ogg QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

See Compression methods for methods and Compression software for codecs

This article is issued from Wikipedia - version of the 5/30/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.