| Audio Video System Control Security Data |
H.323 does not require terminals to have video capabilities. If video
capabilities are provided, it must adhere to the H.261 protocol with QCIF as
its mode. Support for other video compression schemes and resolutions are
optional.
H.261 is one of the most widely used standards for video compression
internationally. It defines the algorithms for encoding and decoding video at
both CIF and QCIF. CIF is the Common Intermediate Format, with a
luminance signal of 352 pixels per line and 288 lines (i.e. 352x288 samples
per frame), and (optionally) two chrominance signals each of 176x144 samples
per frame. The frame rate for CIF is approximately 30 fps. QCIF
means Quarter CIF, with a resolution of 176 x 144 pixels. The video format is
4:2:0 luminance and chrominance (YCrCb).

Figure 1. CIF Picture

Figure 2. QCIF Picture
Frame Organization
In
the H.261 standard, video frames are processed by first dividing the image
into smaller sections or “blocks”. A block is defined as an area with
dimension 8 x 8 in pixels.
A CIF picture can then be defined as having
dimensions 44 x 36 in blocks. Blocks belong to a larger group known as
macroblocks. A macroblock consists of six blocks, 4 blocks for the Y luminance
information, 1 for the Cr chrominance information, and 1 for the Cb
chrominance information. These six blocks combine to form a 16 x 16 pixel
matrix formed from the Y information with the Cr and Cb information averaged
out over the larger block. Thirty-three macroblocks form a larger block known
as a GOB or Group of Blocks. A
CIF picture would have a dimension of 6 x 2 in GOBs.
H.261 compression and encoding
The
H.261 compression algorithm is DCT-based and resembles MPEG to some degree.
DCT stands for Discrete Cosine Transformation. It is a popular method because
it processes data in such a way that it can be more efficiently compressed
using run length encoding (RLE). DCT transforms a block of pixel intensities
into a block of frequency transform coefficients. The transform is then
applied to new blocks until the entire image is transformed. Huffman/RLE
encoding can then be performed on the processed data.
A major difference between H.261
and MPEG is that the quantization value used is variable and is determined by
the amount of data reduction required to fit the available video bandwidth.
H261 has rate control which allows it to cope with a variable video bandwidth. H.261 trades picture quality against motion, which
results in moving pictures having poor image quality as compared to still
pictures.
The H.261 encoding process uses past frames to encode differences, much
like the way MPEG does it. Unlike MPEG, H.261 only has two types of frames.
These are the intra-coded frames and the inter-coded frames. MPEG utilizes
three frames, the I or intra-frame, the P or predicted frame, and the B or
bi-directional frame. H.261 only uses past frames as references for its motion
estimation algorithm. MPEG’s B-frame uses the nearest preceding I or P frame
and the next future I or P frame as its reference. H.261’s intra-coded
frames are frames which are wholly encoded with no reference. Inter-coded
frames are frames whose encoding is based on the previous frame. The H.261
standard states that each macroblock must be intra-coded at least every 132
frames to prevent errors from accumulating. Terminals may also ask for a
complete picture update, wherein an intra-coded frame is sent.
Figure
3. H.261 encoder
If a frame is being encoded as
an intra-frame, the macroblocks go through a DCT transformation, quantization
using a value determined by the rate control feedback and finally through the
zigzag Huffman encoding to produce the final bitstream output. This output is
replicated and sent to a decoder so it can be utilized by the inter-encoder.
Figure
4. H.261 decoder
The
H.261 decoder is much simpler. First of all, the encoder is not merely an
encoder but also has a built-in decoder. This built-in decoder is used for
decoding the bitstream output to recreate the reference frames needed for
motion compensation. A frame can take one of two paths when it reaches the
decoder. This all depends on the type of frame. Inter and Intra coded frames
take different paths during decoding. The intra-coded macroblock is decoded by
simply reversing the DCT-quantization-encode process. The decoded data is then
used to build up the frame. This
frame can then be displayed. A copy of this frame is also stored for use as
reference when decoding inter-coded macroblocks and frames.
Inter-coded
macroblocks are decoded using the vector and reference frame and then filtered
to improve the appearance before sending the decoded macroblock to be
incorporated into the new frame.
H.263
The H.263 video standard is based on H.261 and is designed to compress moving pictures at lower bit rates.
The main elements in the basic H.263 compression algorithm are:
1. inter-picture prediction/motion compensation
2. block transformation
3. quantization
4. variable length coding (VLC)
A decoder would then have the following components:
1. variable length decoding
2. inverse quantization
3. inverse block transformation
4. motion compensation
The coding algorithm of H.263 has some improvements and changes over H.261 and it can often achieve the same video quality that H.261 can produce with less than half the number of bits in the coded stream. This is the reason why it is the preferred video codec over H.261. It can provide almost the same quality at half the bandwidth price.
The main factors that contributed to H.263’s significant improvement over H.261 are:
1. half pixel motion vector prediction – half pixel precision is used for motion compensation, as opposed to H.261’s use of full pixel precision and loopfilters.
2. negotiable options – Four negotiable coding options are included in H.263. These are:
a. Unrestricted Motion Vector mode - In this mode motion vectors are allowed to point outside the picture. "Non-existing" pixels from outside of a picture are reconstructed based on the edge pixels. This mode offers extensive advantage during movements along the edges of a picture (including camera movements).
b. Advanced prediction mode - In addition to 16x16 motion vector for some macroblocks four 8x8 vectors are used. Encoder decides which type of vectors to use. Four vectors take more bits but offer better prediction.
c. Syntax-based arithmetic coding mode - Instead of VLC coding this mode utilizes arithmetic coding. It improves compression ratio of 3-4%, keeping SNR at the same level.
d. PB-frames mode - In this mode two consecutive pictures are coded as one unit, similarly to the MPEG compression. There is one frame predicted from the last decoded frame (P) and one predicted bidirectionally (B) from the last decoded frame and currently decoded P-frame. For simple video sequences it allows doubling the frame rate without increasing the bandwidth.
3. support for new picture resolutions – H.263 defines new frame formats such as Sub-QCIF (128 x 96), 4CIF (4 X CIF), and 16CIF (16 X CIF).
4. some parts of the datastream structure are now optional, so the codec can be configured for a lower bitrate or better error recovery
It is expected that H.263 will replace H.261 in many applications.
| Audio Video System Control Security Data |