RFC2440

[ Contents ]


6. Radix-64 Conversions

   As stated in the introduction, OpenPGP's underlying native
   representation for objects is a stream of arbitrary octets, and some
   systems desire these objects to be immune to damage caused by
   character set translation, data conversions, etc.

   In principle, any printable encoding scheme that met the requirements
   of the unsafe channel would suffice, since it would not change the
   underlying binary bit streams of the native OpenPGP data structures.
   The OpenPGP standard specifies one such printable encoding scheme to
   ensure interoperability.

   OpenPGP's Radix-64 encoding is composed of two parts: a base64
   encoding of the binary data, and a checksum.  The base64 encoding is
   identical to the MIME base64 content-transfer-encoding [RFC2231,
   Section 6.8]. An OpenPGP implementation MAY use ASCII Armor to
   protect the raw binary data.

   The checksum is a 24-bit CRC converted to four characters of radix-64
   encoding by the same MIME base64 transformation, preceded by an
   equals sign (=).  The CRC is computed by using the generator 0x864CFB
   and an initialization of 0xB704CE.  The accumulation is done on the
   data before it is converted to radix-64, rather than on the converted
   data.  A sample implementation of this algorithm is in the next
   section.

   The checksum with its leading equal sign MAY appear on the first line
   after the Base64 encoded data.

   Rationale for CRC-24: The size of 24 bits fits evenly into printable
   base64.  The nonzero initialization can detect more errors than a
   zero initialization.

6.1. An Implementation of the CRC-24 in "C"

       #define CRC24_INIT 0xb704ceL
       #define CRC24_POLY 0x1864cfbL

       typedef long crc24;
       crc24 crc_octets(unsigned char *octets, size_t len)
       {
	   crc24 crc = CRC24_INIT;
	   int i;

	   while (len--) {
	       crc ^= (*octets++) << 16;
	       for (i = 0; i < 8; i++) {
		   crc <<= 1;
		   if (crc & 0x1000000)
		       crc ^= CRC24_POLY;
	       }
	   }
	   return crc & 0xffffffL;
       }

6.2. Forming ASCII Armor

   When OpenPGP encodes data into ASCII Armor, it puts specific headers
   around the data, so OpenPGP can reconstruct the data later. OpenPGP
   informs the user what kind of data is encoded in the ASCII armor
   through the use of the headers.

   Concatenating the following data creates ASCII Armor:

     - An Armor Header Line, appropriate for the type of data

     - Armor Headers

     - A blank (zero-length, or containing only whitespace) line

     - The ASCII-Armored data

     - An Armor Checksum

     - The Armor Tail, which depends on the Armor Header Line.

   An Armor Header Line consists of the appropriate header line text
   surrounded by five (5) dashes ('-', 0x2D) on either side of the
   header line text.  The header line text is chosen based upon the type
   of data that is being encoded in Armor, and how it is being encoded.
   Header line texts include the following strings:

   BEGIN PGP MESSAGE
       Used for signed, encrypted, or compressed files.

   BEGIN PGP PUBLIC KEY BLOCK
       Used for armoring public keys

   BEGIN PGP PRIVATE KEY BLOCK
       Used for armoring private keys

   BEGIN PGP MESSAGE, PART X/Y
       Used for multi-part messages, where the armor is split amongst Y
       parts, and this is the Xth part out of Y.

   BEGIN PGP MESSAGE, PART X
       Used for multi-part messages, where this is the Xth part of an
       unspecified number of parts. Requires the MESSAGE-ID Armor Header
       to be used.

   BEGIN PGP SIGNATURE
       Used for detached signatures, OpenPGP/MIME signatures, and
       natures following clearsigned messages. Note that PGP 2.x s BEGIN
       PGP MESSAGE for detached signatures.

   The Armor Headers are pairs of strings that can give the user or the
   receiving OpenPGP implementation some information about how to decode
   or use the message.	The Armor Headers are a part of the armor, not a
   part of the message, and hence are not protected by any signatures
   applied to the message.

   The format of an Armor Header is that of a key-value pair.  A colon
   (':' 0x38) and a single space (0x20) separate the key and value.
   OpenPGP should consider improperly formatted Armor Headers to be
   corruption of the ASCII Armor.  Unknown keys should be reported to
   the user, but OpenPGP should continue to process the message.

   Currently defined Armor Header Keys are:

     - "Version", that states the OpenPGP Version used to encode the
       message.

     - "Comment", a user-defined comment.

     - "MessageID", a 32-character string of printable characters.  The
       string must be the same for all parts of a multi-part message
       that uses the "PART X" Armor Header.  MessageID strings should be

       unique enough that the recipient of the mail can associate all
       the parts of a message with each other. A good checksum or
       cryptographic hash function is sufficient.

     - "Hash", a comma-separated list of hash algorithms used in this
       message. This is used only in clear-signed messages.

     - "Charset", a description of the character set that the plaintext
       is in. Please note that OpenPGP defines text to be in UTF-8 by
       default. An implementation will get best results by translating
       into and out of UTF-8. However, there are many instances where
       this is easier said than done. Also, there are communities of
       users who have no need for UTF-8 because they are all happy with
       a character set like ISO Latin-5 or a Japanese character set. In
       such instances, an implementation MAY override the UTF-8 default
       by using this header key. An implementation MAY implement this
       key and any translations it cares to; an implementation MAY
       ignore it and assume all text is UTF-8.

       The MessageID SHOULD NOT appear unless it is in a multi-part
       message. If it appears at all, it MUST be computed from the
       finished (encrypted, signed, etc.) message in a deterministic
       fashion, rather than contain a purely random value.  This is to
       allow the legitimate recipient to determine that the MessageID
       cannot serve as a covert means of leaking cryptographic key
       information.

   The Armor Tail Line is composed in the same manner as the Armor
   Header Line, except the string "BEGIN" is replaced by the string
   "END."

6.3. Encoding Binary in Radix-64

   The encoding process represents 24-bit groups of input bits as output
   strings of 4 encoded characters. Proceeding from left to right, a
   24-bit input group is formed by concatenating three 8-bit input
   groups. These 24 bits are then treated as four concatenated 6-bit
   groups, each of which is translated into a single digit in the
   Radix-64 alphabet. When encoding a bit stream with the Radix-64
   encoding, the bit stream must be presumed to be ordered with the
   most-significant-bit first. That is, the first bit in the stream will
   be the high-order bit in the first 8-bit octet, and the eighth bit
   will be the low-order bit in the first 8-bit octet, and so on.


	 +--first octet--+-second octet--+--third octet--+
	 |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
	 +-----------+---+-------+-------+---+-----------+
	 |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
	 +--1.index--+--2.index--+--3.index--+--4.index--+

   Each 6-bit group is used as an index into an array of 64 printable
   characters from the table below. The character referenced by the
   index is placed in the output string.

     Value Encoding  Value Encoding  Value Encoding  Value Encoding
	 0 A		17 R		34 i		51 z
	 1 B		18 S		35 j		52 0
	 2 C		19 T		36 k		53 1
	 3 D		20 U		37 l		54 2
	 4 E		21 V		38 m		55 3
	 5 F		22 W		39 n		56 4
	 6 G		23 X		40 o		57 5
	 7 H		24 Y		41 p		58 6
	 8 I		25 Z		42 q		59 7
	 9 J		26 a		43 r		60 8
	10 K		27 b		44 s		61 9
	11 L		28 c		45 t		62 +
	12 M		29 d		46 u		63 /
	13 N		30 e		47 v
	14 O		31 f		48 w	     (pad) =
	15 P		32 g		49 x
	16 Q		33 h		50 y

   The encoded output stream must be represented in lines of no more
   than 76 characters each.

   Special processing is performed if fewer than 24 bits are available
   at the end of the data being encoded. There are three possibilities:

    1. The last data group has 24 bits (3 octets). No special
       processing is needed.

    2. The last data group has 16 bits (2 octets). The first two 6-bit
       groups are processed as above. The third (incomplete) data group
       has two zero-value bits added to it, and is processed as above.
       A pad character (=) is added to the output.

    3. The last data group has 8 bits (1 octet). The first 6-bit group
       is processed as above. The second (incomplete) data group has
       four zero-value bits added to it, and is processed as above. Two
       pad characters (=) are added to the output.


6.4. Decoding Radix-64

   Any characters outside of the base64 alphabet are ignored in Radix-64
   data. Decoding software must ignore all line breaks or other
   characters not found in the table above.

   In Radix-64 data, characters other than those in the table, line
   breaks, and other white space probably indicate a transmission error,
   about which a warning message or even a message rejection might be
   appropriate under some circumstances.

   Because it is used only for padding at the end of the data, the
   occurrence of any "=" characters may be taken as evidence that the
   end of the data has been reached (without truncation in transit). No
   such assurance is possible, however, when the number of octets
   transmitted was a multiple of three and no "=" characters are
   present.

6.5. Examples of Radix-64

       Input data:  0x14fb9c03d97e
       Hex:	1   4	 f   b	  9   c     | 0   3    d   9	7   e
       8-bit:	00010100 11111011 10011100  | 00000011 11011001
       11111110
       6-bit:	000101 001111 101110 011100 | 000000 111101 100111
       111110
       Decimal: 5      15     46     28       0      61     37	   62
       Output:	F      P      u      c	      A      9	    l	   +

       Input data:  0x14fb9c03d9
       Hex:	1   4	 f   b	  9   c     | 0   3    d   9
       8-bit:	00010100 11111011 10011100  | 00000011 11011001
						       pad with 00
       6-bit:	000101 001111 101110 011100 | 000000 111101 100100
       Decimal: 5      15     46     28       0      61     36
							  pad with =
       Output:	F      P      u      c	      A      9	    k	   =

       Input data:  0x14fb9c03
       Hex:	1   4	 f   b	  9   c     | 0   3
       8-bit:	00010100 11111011 10011100  | 00000011
					      pad with 0000
       6-bit:	000101 001111 101110 011100 | 000000 110000
       Decimal: 5      15     46     28       0      48
						   pad with =	   =
       Output:	F      P      u      c	      A      w	    =	   =


6.6. Example of an ASCII Armored Message


  -----BEGIN PGP MESSAGE-----
  Version: OpenPrivacy 0.99

  yDgBO22WxBHv7O8X7O/jygAEzol56iUKiXmV+XmpCtmpqQUKiQrFqclFqUDBovzS
  vBSFjNSiVHsuAA==
  =njUN
  -----END PGP MESSAGE-----

   Note that this example is indented by two spaces.


HTML conversion and comments on this are RFC are Copyright (c) 1998 Werner Koch, Remscheider Str. 22, 40215 Düsseldorf, Germany. Verbatim copying and distribution is permitted in any medium, provided this notice is preserved. See here for copyright information on the RFC itself.

Updated: 1999-09-30 wkoch