[ Contents ]
6. Radix-64 Conversions As stated in the introduction, OpenPGP's underlying native representation for objects is a stream of arbitrary octets, and some systems desire these objects to be immune to damage caused by character set translation, data conversions, etc. In principle, any printable encoding scheme that met the requirements of the unsafe channel would suffice, since it would not change the underlying binary bit streams of the native OpenPGP data structures. The OpenPGP standard specifies one such printable encoding scheme to ensure interoperability. OpenPGP's Radix-64 encoding is composed of two parts: a base64 encoding of the binary data, and a checksum. The base64 encoding is identical to the MIME base64 content-transfer-encoding [RFC2231, Section 6.8]. An OpenPGP implementation MAY use ASCII Armor to protect the raw binary data. The checksum is a 24-bit CRC converted to four characters of radix-64 encoding by the same MIME base64 transformation, preceded by an equals sign (=). The CRC is computed by using the generator 0x864CFB and an initialization of 0xB704CE. The accumulation is done on the data before it is converted to radix-64, rather than on the converted data. A sample implementation of this algorithm is in the next section. The checksum with its leading equal sign MAY appear on the first line after the Base64 encoded data. Rationale for CRC-24: The size of 24 bits fits evenly into printable base64. The nonzero initialization can detect more errors than a zero initialization. 6.1. An Implementation of the CRC-24 in "C" #define CRC24_INIT 0xb704ceL #define CRC24_POLY 0x1864cfbL typedef long crc24; crc24 crc_octets(unsigned char *octets, size_t len) { crc24 crc = CRC24_INIT; int i; while (len--) { crc ^= (*octets++) << 16; for (i = 0; i < 8; i++) { crc <<= 1; if (crc & 0x1000000) crc ^= CRC24_POLY; } } return crc & 0xffffffL; } 6.2. Forming ASCII Armor When OpenPGP encodes data into ASCII Armor, it puts specific headers around the data, so OpenPGP can reconstruct the data later. OpenPGP informs the user what kind of data is encoded in the ASCII armor through the use of the headers. Concatenating the following data creates ASCII Armor: - An Armor Header Line, appropriate for the type of data - Armor Headers - A blank (zero-length, or containing only whitespace) line - The ASCII-Armored data - An Armor Checksum - The Armor Tail, which depends on the Armor Header Line. An Armor Header Line consists of the appropriate header line text surrounded by five (5) dashes ('-', 0x2D) on either side of the header line text. The header line text is chosen based upon the type of data that is being encoded in Armor, and how it is being encoded. Header line texts include the following strings: BEGIN PGP MESSAGE Used for signed, encrypted, or compressed files. BEGIN PGP PUBLIC KEY BLOCK Used for armoring public keys BEGIN PGP PRIVATE KEY BLOCK Used for armoring private keys BEGIN PGP MESSAGE, PART X/Y Used for multi-part messages, where the armor is split amongst Y parts, and this is the Xth part out of Y. BEGIN PGP MESSAGE, PART X Used for multi-part messages, where this is the Xth part of an unspecified number of parts. Requires the MESSAGE-ID Armor Header to be used. BEGIN PGP SIGNATURE Used for detached signatures, OpenPGP/MIME signatures, and natures following clearsigned messages. Note that PGP 2.x s BEGIN PGP MESSAGE for detached signatures. The Armor Headers are pairs of strings that can give the user or the receiving OpenPGP implementation some information about how to decode or use the message. The Armor Headers are a part of the armor, not a part of the message, and hence are not protected by any signatures applied to the message. The format of an Armor Header is that of a key-value pair. A colon (':' 0x38) and a single space (0x20) separate the key and value. OpenPGP should consider improperly formatted Armor Headers to be corruption of the ASCII Armor. Unknown keys should be reported to the user, but OpenPGP should continue to process the message. Currently defined Armor Header Keys are: - "Version", that states the OpenPGP Version used to encode the message. - "Comment", a user-defined comment. - "MessageID", a 32-character string of printable characters. The string must be the same for all parts of a multi-part message that uses the "PART X" Armor Header. MessageID strings should be unique enough that the recipient of the mail can associate all the parts of a message with each other. A good checksum or cryptographic hash function is sufficient. - "Hash", a comma-separated list of hash algorithms used in this message. This is used only in clear-signed messages. - "Charset", a description of the character set that the plaintext is in. Please note that OpenPGP defines text to be in UTF-8 by default. An implementation will get best results by translating into and out of UTF-8. However, there are many instances where this is easier said than done. Also, there are communities of users who have no need for UTF-8 because they are all happy with a character set like ISO Latin-5 or a Japanese character set. In such instances, an implementation MAY override the UTF-8 default by using this header key. An implementation MAY implement this key and any translations it cares to; an implementation MAY ignore it and assume all text is UTF-8. The MessageID SHOULD NOT appear unless it is in a multi-part message. If it appears at all, it MUST be computed from the finished (encrypted, signed, etc.) message in a deterministic fashion, rather than contain a purely random value. This is to allow the legitimate recipient to determine that the MessageID cannot serve as a covert means of leaking cryptographic key information. The Armor Tail Line is composed in the same manner as the Armor Header Line, except the string "BEGIN" is replaced by the string "END." 6.3. Encoding Binary in Radix-64 The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating three 8-bit input groups. These 24 bits are then treated as four concatenated 6-bit groups, each of which is translated into a single digit in the Radix-64 alphabet. When encoding a bit stream with the Radix-64 encoding, the bit stream must be presumed to be ordered with the most-significant-bit first. That is, the first bit in the stream will be the high-order bit in the first 8-bit octet, and the eighth bit will be the low-order bit in the first 8-bit octet, and so on. +--first octet--+-second octet--+--third octet--+ |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +-----------+---+-------+-------+---+-----------+ |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0| +--1.index--+--2.index--+--3.index--+--4.index--+ Each 6-bit group is used as an index into an array of 64 printable characters from the table below. The character referenced by the index is placed in the output string. Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y The encoded output stream must be represented in lines of no more than 76 characters each. Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. There are three possibilities: 1. The last data group has 24 bits (3 octets). No special processing is needed. 2. The last data group has 16 bits (2 octets). The first two 6-bit groups are processed as above. The third (incomplete) data group has two zero-value bits added to it, and is processed as above. A pad character (=) is added to the output. 3. The last data group has 8 bits (1 octet). The first 6-bit group is processed as above. The second (incomplete) data group has four zero-value bits added to it, and is processed as above. Two pad characters (=) are added to the output. 6.4. Decoding Radix-64 Any characters outside of the base64 alphabet are ignored in Radix-64 data. Decoding software must ignore all line breaks or other characters not found in the table above. In Radix-64 data, characters other than those in the table, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances. Because it is used only for padding at the end of the data, the occurrence of any "=" characters may be taken as evidence that the end of the data has been reached (without truncation in transit). No such assurance is possible, however, when the number of octets transmitted was a multiple of three and no "=" characters are present. 6.5. Examples of Radix-64 Input data: 0x14fb9c03d97e Hex: 1 4 f b 9 c | 0 3 d 9 7 e 8-bit: 00010100 11111011 10011100 | 00000011 11011001 11111110 6-bit: 000101 001111 101110 011100 | 000000 111101 100111 111110 Decimal: 5 15 46 28 0 61 37 62 Output: F P u c A 9 l + Input data: 0x14fb9c03d9 Hex: 1 4 f b 9 c | 0 3 d 9 8-bit: 00010100 11111011 10011100 | 00000011 11011001 pad with 00 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 Decimal: 5 15 46 28 0 61 36 pad with = Output: F P u c A 9 k = Input data: 0x14fb9c03 Hex: 1 4 f b 9 c | 0 3 8-bit: 00010100 11111011 10011100 | 00000011 pad with 0000 6-bit: 000101 001111 101110 011100 | 000000 110000 Decimal: 5 15 46 28 0 48 pad with = = Output: F P u c A w = = 6.6. Example of an ASCII Armored Message -----BEGIN PGP MESSAGE----- Version: OpenPrivacy 0.99 yDgBO22WxBHv7O8X7O/jygAEzol56iUKiXmV+XmpCtmpqQUKiQrFqclFqUDBovzS vBSFjNSiVHsuAA== =njUN -----END PGP MESSAGE----- Note that this example is indented by two spaces.
Updated: 1999-09-30 wkoch