view doc/bpg_spec.txt @ 0:772086c29cc7

Initial import.
author Matti Hamalainen <ccr@tnsp.org>
date Wed, 16 Nov 2016 11:16:33 +0200
parents
children
line wrap: on
line source

BPG Specification

version 0.9.5

Copyright (c) 2014-2015 Fabrice Bellard

1) Introduction
---------------

BPG is a lossy and lossless picture compression format based on HEVC
[1]. It supports grayscale, YCbCr, RGB, YCgCo color spaces with an
optional alpha channel. CMYK is supported by reusing the alpha channel
to encode an additional white component. The bit depth of each
component is from 8 to 14 bits. The color values are stored either in
full range (JPEG case) or limited range (video case). The YCbCr color
space is either BT 601 (JPEG case), BT 709 or BT 2020.

The chroma can be subsampled by a factor of two in horizontal or both
in horizontal and vertical directions (4:4:4, 4:2:2 or 4:2:0 chroma
formats are supported). In order to be able to transcode JPEG images
or video frames without modification to the chroma, both JPEG and
MPEG2 chroma sample positions are supported.

Progressive decoding and display is supported by interleaving the
alpha and color data.

Arbitrary metadata (such as EXIF, ICC profile, XMP) are supported.

Animations are supported as an optional feature. Decoders not
supporting animation display the first frame of the animation.

2) Bitstream conventions
------------------------

The bit stream is byte aligned and bit fields are read from most
significant to least signficant bit in each byte.

- u(n) is an unsigned integer stored on n bits.

- ue7(n) is an unsigned integer of at most n bits stored on a variable
  number of bytes. All the bytes except the last one have a '1' as
  their first bit. The unsigned integer is represented as the
  concatenation of the remaining 7 bit codewords. Only the shortest
  encoding for a given unsigned integer shall be accepted by the
  decoder (i.e. the first byte is never 0x80). Example:

  Encoded bytes       Unsigned integer value
  0x08                8
  0x84 0x1e           542
  0xac 0xbe 0x17      728855

- ue(v) : unsigned integer 0-th order Exp-Golomb-coded (see HEVC
  specification).

- b(8) is an arbitrary byte.

3) File format
--------------

3.1) Syntax
-----------

heic_file() {

     file_magic                                                  u(32)

     pixel_format                                                u(3)
     alpha1_flag                                                 u(1)
     bit_depth_minus_8                                           u(4)

     color_space                                                 u(4)
     extension_present_flag                                      u(1)
     alpha2_flag                                                 u(1)
     limited_range_flag                                          u(1)
     animation_flag                                              u(1)
     
     picture_width                                               ue7(32)
     picture_height                                              ue7(32)
     
     picture_data_length                                         ue7(32)
     if (extension_present_flag)  
         extension_data_length                                   ue7(32)
         extension_data()
     }

     hevc_header_and_data()
}

extension_data() 
{
     while (more_bytes()) {
         extension_tag                                           ue7(32)
         extension_tag_length                                    ue7(32)
         if (extension_tag == 5) {
             animation_control_extension(extension_tag_length)
         } else {
             for(j = 0; j < extension_tag_length; j++) {
                 extension_tag_data_byte                         b(8)
             }
         }
     }
}

animation_control_extension(payload_length)
{
    loop_count                                                   ue7(16)
    frame_period_num                                             ue7(16)
    frame_period_den                                             ue7(16)
    while (more_bytes()) {
        dummy_byte                                               b(8)
    }
}

hevc_header_and_data()
{
     if (alpha1_flag || alpha2_flag) {
         hevc_header()
     }
     hevc_header()
     hevc_data()
}

hevc_header()
{
     hevc_header_length                                          ue7(32)
     log2_min_luma_coding_block_size_minus3                      ue(v)
     log2_diff_max_min_luma_coding_block_size                    ue(v)
     log2_min_transform_block_size_minus2                        ue(v)
     log2_diff_max_min_transform_block_size                      ue(v)
     max_transform_hierarchy_depth_intra                         ue(v)
     sample_adaptive_offset_enabled_flag                         u(1)
     pcm_enabled_flag                                            u(1)
     if (pcm_enabled_flag) {
         pcm_sample_bit_depth_luma_minus1                        u(4)
         pcm_sample_bit_depth_chroma_minus1                      u(4)
         log2_min_pcm_luma_coding_block_size_minus3              ue(v)
         log2_diff_max_min_pcm_luma_coding_block_size            ue(v)
         pcm_loop_filter_disabled_flag                           u(1)
     }
     strong_intra_smoothing_enabled_flag                         u(1)
     sps_extension_present_flag                                  u(1)
     if (sps_extension_present_flag) {
         sps_range_extension_flag                                u(1)
         sps_extension_7bits                                     u(7)     
     }
     if (sps_range_extension_flag) {
         transform_skip_rotation_enabled_flag                    u(1)
         transform_skip_context_enabled_flag                     u(1)
         implicit_rdpcm_enabled_flag                             u(1)
         explicit_rdpcm_enabled_flag                             u(1)
         extended_precision_processing_flag                      u(1)
         intra_smoothing_disabled_flag                           u(1)
         high_precision_offsets_enabled_flag                     u(1)
         persistent_rice_adaptation_enabled_flag                 u(1)
         cabac_bypass_alignment_enabled_flag                     u(1)
     }
     trailing_bits                                               u(v)
}

hevc_data() 
{
     for(i = 0; i < v; i++) {
         hevc_data_byte                                          b(8)
     }
}

frame_duration_sei(payloadSize)
{
     frame_duration                                              u(16)
}

3.2) Semantics
--------------

     'file_magic' is defined as 0x425047fb.

     'pixel_format' indicates the chroma subsampling:

       0 : Grayscale
       1 : 4:2:0. Chroma at position (0.5, 0.5) (JPEG chroma position)
       2 : 4:2:2. Chroma at position (0.5, 0) (JPEG chroma position)
       3 : 4:4:4
       4 : 4:2:0. Chroma at position (0, 0.5) (MPEG2 chroma position)
       5 : 4:2:2. Chroma at position (0, 0) (MPEG2 chroma position)

       The other values are reserved.
       
     'alpha1_flag' and 'alpha2_flag' give information about the alpha plane:

       alpha1_flag=0 alpha2_flag=0: no alpha plane.

       alpha1_flag=1 alpha2_flag=0: alpha present. The color is not
       premultiplied.
        
       alpha1_flag=1 alpha2_flag=1: alpha present. The color is
       premultiplied. The resulting non-premultiplied R', G', B' shall
       be recovered as:
          
         if A != 0 
           R' = min(R / A, 1), G' = min(G / A, 1), B' = min(B / A, 1)
         else
           R' = G' = B' = 1 .
         
       alpha1_flag=0 alpha2_flag=1: the alpha plane is present and
       contains the W color component (CMYK color). The resulting CMYK
       data can be recovered as follows:

         C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - W) .
     
       In case no color profile is specified, the sRGB color R'G'B'
       shall be computed as: 

         R' = R * W, G' = G * W, B' = B * W .

     'bit_depth_minus_8' is the number of bits used for each component
     minus 8. In this version of the specification, bit_depth_minus_8
     <= 6.

     'extension_present_flag' indicates that extension data are
     present.

     'color_space' specifies how to convert the color planes to
     RGB. It must be 0 when pixel_format = 0 (grayscale):

       0 : YCbCr (BT 601, same as JPEG and HEVC matrix_coeffs = 5)
       1 : RGB (component order: G B R)
       2 : YCgCo (same as HEVC matrix_coeffs = 8)
       3 : YCbCr (BT 709, same as HEVC matrix_coeffs = 1)
       4 : YCbCr (BT 2020 non constant luminance system, same as HEVC
       matrix_coeffs = 9)
       5 : reserved for BT 2020 constant luminance system, not
       supported in this version of the specification.

       The other values are reserved.

       YCbCr is defined using the BT 601, BT 709 or BT 2020 conversion
       matrices.

       For RGB, G is stored as the Y plane. B in the Cb plane and R in
       the Cr plane.

       YCgCo is defined as HEVC matrix_coeffs = 8. Y is stored in the
       Y plane. Cg in the Cb plane and Co in the Cr plane.
       
       If no color profile is present, the RGB output data are assumed
       to be in the sRGB color space [6].

     'limited_range_flag': opposite of the HEVC video_full_range_flag.
     The value zero indicates that the full range of each color
     component is used. The value one indicates that a limited range
     is used:

          - (16 << (bit_depth - 8) to (235 << (bit_depth - 8)) for Y
     and G, B, R,
          - (16 << (bit_depth - 8) to (240 << (bit_depth - 8)) for Cb and Cr.

     For the YCgCo color space, the range limitation shall be done on
     the RGB data.

     The alpha (or W) plane always uses the full range.

     'animation_flag'. The value '1' indicates that more than one
     frame are encoded in the hevc data. The animation control
     extension must be present. If the decoder does not support
     animations, it shall decode the first frame only and ignore the
     animation information.

     'picture_width' is the picture width in pixels. The value 0 is
     not allowed.

     'picture_height' is the picture height in pixels. The value 0 is
     not allowed.

     'picture_data_length' is the picture data length in bytes. The
     special value of zero indicates that the picture data goes up to
     the end of the file.

     'extension_data_length' is the extension data length in bytes.

     'extension_data()' is the extension data.

     'extension_tag' is the extension tag. The following values are defined:

       1: EXIF data.

       2: ICC profile (see [4])

       3: XMP (see [5])

       4: Thumbnail (the thumbnail shall be a lower resolution version
       of the image and stored in BPG format).

       5: Animation control data.

     The decoder shall ignore the tags it does not support.

     'extension_tag_length' is the length in bytes of the extension tag.

     'loop_count' gives the number of times the animation shall be
     played. The value of 0 means infinite.
     
     'frame_period_num' and 'frame_period_den' encode the default
     delay between each frame as frame_period_num/frame_period_den
     seconds. The value of 0 for 'frame_period_num' or
     'frame_period_den' is forbidden.
     
     'hevc_header_length' is the length in bytes of the following data
     up to and including 'trailing_bits'.
     
     'log2_min_luma_coding_block_size_minus3',
     'log2_diff_max_min_luma_coding_block_size',
     'log2_min_transform_block_size_minus2',
     'log2_diff_max_min_transform_block_size',
     'max_transform_hierarchy_depth_intra',
     'sample_adaptive_offset_enabled_flag', 'pcm_enabled_flag',
     'pcm_sample_bit_depth_luma_minus1',
     'pcm_sample_bit_depth_chroma_minus1',
     'log2_min_pcm_luma_coding_block_size_minus3',
     'log2_diff_max_min_pcm_luma_coding_block_size',
     'pcm_loop_filter_disabled_flag',
     'strong_intra_smoothing_enabled_flag', 'sps_extension_flag'
     'sps_extension_present_flag', 'sps_range_extension_flag'
     'transform_skip_rotation_enabled_flag',
     'transform_skip_context_enabled_flag',
     'implicit_rdpcm_enabled_flag', 'explicit_rdpcm_enabled_flag',
     'extended_precision_processing_flag',
     'intra_smoothing_disabled_flag',
     'high_precision_offsets_enabled_flag',
     'persistent_rice_adaptation_enabled_flag',
     'cabac_bypass_alignment_enabled_flag' are
     the corresponding fields of the HEVC SPS syntax element.
         
     'trailing_bits' has a value of 0 and has a length from 0 to 7
     bits so that the next data is byte aligned.

     'hevc_data()' contains the corresponding HEVC picture data,
     excluding the first NAL start code (i.e. the first 0x00 0x00 0x01
     or 0x00 0x00 0x00 0x01 bytes). The VPS and SPS NALs shall not be
     included in the HEVC picture data. The decoder can recover the
     necessary fields from the header by doing the following
     assumptions:

     - vps_video_parameter_set_id = 0
     - sps_video_parameter_set_id = 0
     - sps_max_sub_layers = 1
     - sps_seq_parameter_set_id = 0
     - chroma_format_idc: for picture data: 
         chroma_format_idc = pixel_format
       for alpha data: 
         chroma_format_idc = 0.
     - separate_colour_plane_flag = 0
     - pic_width_in_luma_samples = ceil(picture_width/cb_size) * cb_size
     - pic_height_in_luma_samples = ceil(picture_height/cb_size) * cb_size
       with cb_size = 1 << log2_min_luma_coding_block_size
     - bit_depth_luma_minus8 = bit_depth_minus_8
     - bit_depth_chroma_minus8 = bit_depth_minus_8
     - max_transform_hierarchy_depth_inter = max_transform_hierarchy_depth_intra
     - scaling_list_enabled_flag = 0
     - log2_max_pic_order_cnt_lsb_minus4 = 4
     - amp_enabled_flag = 1
     - sps_temporal_mvp_enabled_flag = 1
     

     Alpha data encoding:

     - If alpha data is present, all the corresponding NALs have
       nuh_layer_id = 1. NALs for color data shall have nuh_layer_id =
       0.
     - Alpha data shall use the same tile sizes as color data and
       shall have the same entropy_coding_sync_enabled_flag value as
       color data.
     - Alpha slices shall use the same number of coding units as color
       slices and should be interleaved with color slices. alpha NALs
       shall come before the corresponding color NALs.

     Animation encoding:

     - The optional prefix SEI with payloadType = 257 (defined in
       frame_duration_sei()) specifies that the image must be repeated
       'frame_duration' times. 'frame_duration' shall not be zero. If
       the frame duration SEI is not present for a given frame,
       frame_duration = 1 shall be assumed by the decoder. If alpha
       data is present, the frame duration SEI shall be present only
       for the color data.
     
3.3) HEVC Profile
-----------------

Conforming HEVC bit streams shall conform to the Main 4:4:4 16 Still
Picture, Level 8.5 of the HEVC specification with the following
modifications.

- separate_colour_plane_flag shall be 0 when present.

- bit_depth_luma_minus8 <= 6

- bit_depth_chroma_minus8 = bit_depth_luma_minus8

- explicit_rdpcm_enabled_flag = 0 (does not matter for intra frames)

- extended_precision_processing_flag = 0

- cabac_bypass_alignment_enabled_flag = 0

- high_precision_offsets_enabled_flag = 0 (does not matter for intra frames)

- If the encoded image is larger than the size indicated by
picture_width and picture_height, the lower right part of the decoded
image shall be cropped. If a horizontal (resp. vertical) decimation by
two is done for the chroma and that the width (resp. height) is n
pixels, ceil(n/2) pixels must be kept as the resulting chroma
information.

When animations are present, the next frames shall be encoded with the
following changes:

- P slices are allowed (but B slices are not allowed).

- Only the previous picture can be used as reference (hence a DPB size
  of 2 pictures).

4) Design choices
-----------------

(This section is informative)

- Our design principle was to keep the format as simple as possible
  while taking the HEVC codec as basis. Our main metric to evaluate
  the simplicity was the size of a software decoder which outputs 32
  bit RGBA pixel data.

- Pixel formats: we wanted to be able to convert JPEG images to BPG
  with as little loss as possible. So supporting the same color space
  (BT 601 YCbCr) with the same range (full range) and most of the
  allowed JPEG chroma formats (4:4:4, 4:2:2, 4:2:0 or grayscale) was
  mandatory to avoid going back to RGB or doing a subsampling or
  interpolation.

- Alpha support: alpha support is mandatory. We chose to use a
  separate HEVC monochrome plane to handle it instead of another
  format to simplify the decoder. The color is either
  non-premultiplied or premultiplied. Premultiplied alpha usually
  gives a better compression. Non-premultiplied alpha is supported in
  case no loss is needed on the color components. In order to allow
  progressive display, the alpha and color data are interleaved (the
  nuh_layed_id NAL field is 0 for color data and 1 for alpha
  data). The alpha and color slices should contain the same number of
  coding units and each alpha slice should come before the
  corresponding color slice. Since alpha slices are usually smaller
  than color slices, it allows a progressive display even if there is
  a single slice.

- Color spaces: In addition to YCbCr, RGB is supported for the high
  quality or lossless cases. YCgCo is supported because it may give
  slightly better results than YCbCr for high quality images. CMYK is
  supported so that JPEGs containing this color space can be
  converted. The alpha plane is used to store the W (1-K) plane. The
  data is stored with inverted components (1-X) so that the conversion
  to RGB is simplified. The support of the BT 709 and BT 2020 (non
  constant luminance) YCbCr encodings and of the limited range color
  values were added to reduce the losses when converting video frames.

- Bit depth: we decided to support the HEVC bit depths 8 to 14. The
  added complexity is small and it allows to support high quality
  pictures from cameras.

- Picture file format: keeping a completely standard HEVC stream would
  have meant a more difficult parsing for the picture header which is
  a problem for the various image utilities to get the basic picture
  information (pixel format, width, height). So we added a small
  header before the HEVC bit stream. The picture header is byte
  oriended so it is easy to parse.

- HEVC bit stream: the standard HEVC headers (the VPS and SPS NALs)
  give an overhead of about 60 bytes for no added value in the case of
  picture compression. Since the alpha plane uses a different HEVC bit
  stream, it also adds the same overhead again. So we removed the VPS
  and SPS NALs and added a very small header with the equivalent
  information (typically 4 bytes). We also removed the first NAL start
  code which is not useful. It is still possible to reconstruct a
  standard HEVC stream to feed an unmodified hardware decoder if needed.

- Extensions: the metadata are stored at the beginning of the file so
  that they can be read at the same time as the header. Since metadata
  tend to evolve faster than the image formats, we left room for
  extension by using a (tag, lengh) representation. The decoder can
  easily skip all the metadata because their length is explicitly
  stored in the image header.

- Animations: they are interesting compared to WebM or MP4 short
  videos for the following reasons:
    * transparency is supported
    * lossless encoding is supported
    * the decoding resources are smaller than with a generic video
      player because only two frames need to be stored (DPB size = 2).
    * the animations are expected to be small so the decoder can cache
      all the decoded frames in memory.
    * the animation can be decoded as a still image if the decoder
      does not support animations.
  Compared to the other animated image formats (GIF, APNG, WebP), the
  compression ratio is usually much higher because of the HEVC inter
  frame prediction.

5) References
-------------

[1] High efficiency video coding (HEVC) version 2 (ITU-T Recommendation H.265)

[2] JPEG File Interchange Format version 1.02 ( http://www.w3.org/Graphics/JPEG/jfif3.pdf )

[3] EXIF version 2.2 (JEITA CP-3451)

[4] The International Color Consortium ( http://www.color.org/ )

[5] Extensible Metadata Platform (XMP) http://www.adobe.com/devnet/xmp.html

[6] sRGB color space, IEC 61966-2-1