diff options
Diffstat (limited to 'doc/base64.txt')
-rw-r--r-- | doc/base64.txt | 107 |
1 files changed, 0 insertions, 107 deletions
diff --git a/doc/base64.txt b/doc/base64.txt deleted file mode 100644 index 48eafab..0000000 --- a/doc/base64.txt +++ /dev/null @@ -1,107 +0,0 @@ - The Base64 Content-Transfer-Encoding is designed to represent - arbitrary sequences of octets in a form that need not be humanly - readable. The encoding and decoding algorithms are simple, but the - encoded data are consistently only about 33 percent larger than the - unencoded data. This encoding is virtually identical to the one used - in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421. - The base64 encoding is adapted from RFC 1421, with one change: base64 - eliminates the "*" mechanism for embedded clear text. - - A 65-character subset of US-ASCII is used, enabling 6 bits to be - represented per printable character. (The extra 65th character, "=", - is used to signify a special processing function.) - - NOTE: This subset has the important property that it is - represented identically in all versions of ISO 646, including US - ASCII, and all characters in the subset are also represented - identically in all versions of EBCDIC. Other popular encodings, - such as the encoding used by the uuencode utility and the base85 - encoding specified as part of Level 2 PostScript, do not share - these properties, and thus do not fulfill the portability - requirements a binary transport encoding for mail must meet. - - The encoding process represents 24-bit groups of input bits as output - strings of 4 encoded characters. Proceeding from left to right, a - 24-bit input group is formed by concatenating 3 8-bit input groups. - These 24 bits are then treated as 4 concatenated 6-bit groups, each - of which is translated into a single digit in the base64 alphabet. - When encoding a bit stream via the base64 encoding, the bit stream - must be presumed to be ordered with the most-significant-bit first. - - That is, the first bit in the stream will be the high-order bit in - the first byte, and the eighth bit will be the low-order bit in the - first byte, and so on. - - Each 6-bit group is used as an index into an array of 64 printable - characters. The character referenced by the index is placed in the - output string. These characters, identified in Table 1, below, are - selected so as to be universally representable, and the set excludes - characters with particular significance to SMTP (e.g., ".", CR, LF) - and to the encapsulation boundaries defined in this document (e.g., - "-"). - - Table 1: The Base64 Alphabet - - Value Encoding Value Encoding Value Encoding Value Encoding - 0 A 17 R 34 i 51 z - 1 B 18 S 35 j 52 0 - 2 C 19 T 36 k 53 1 - 3 D 20 U 37 l 54 2 - 4 E 21 V 38 m 55 3 - 5 F 22 W 39 n 56 4 - 6 G 23 X 40 o 57 5 - 7 H 24 Y 41 p 58 6 - 8 I 25 Z 42 q 59 7 - 9 J 26 a 43 r 60 8 - 10 K 27 b 44 s 61 9 - 11 L 28 c 45 t 62 + - 12 M 29 d 46 u 63 / - 13 N 30 e 47 v - 14 O 31 f 48 w (pad) = - 15 P 32 g 49 x - 16 Q 33 h 50 y - The output stream (encoded bytes) must be represented in lines of no - more than 76 characters each. All line breaks or other characters - not found in Table 1 must be ignored by decoding software. In base64 - data, characters other than those in Table 1, line breaks, and other - white space probably indicate a transmission error, about which a - warning message or even a message rejection might be appropriate - under some circumstances. - - Special processing is performed if fewer than 24 bits are available - at the end of the data being encoded. A full encoding quantum is - always completed at the end of a body. When fewer than 24 input bits - are available in an input group, zero bits are added (on the right) - to form an integral number of 6-bit groups. Padding at the end of - the data is performed using the '=' character. Since all base64 - input is an integral number of octets, only the following cases can -arise: (1) the final quantum of encoding input is an integral - multiple of 24 bits; here, the final unit of encoded output will be - an integral multiple of 4 characters with no "=" padding, (2) the - final quantum of encoding input is exactly 8 bits; here, the final - unit of encoded output will be two characters followed by two "=" - padding characters, or (3) the final quantum of encoding input is - exactly 16 bits; here, the final unit of encoded output will be three - characters followed by one "=" padding character. - - Because it is used only for padding at the end of the data, the - occurrence of any '=' characters may be taken as evidence that the - end of the data has been reached (without truncation in transit). No - such assurance is possible, however, when the number of octets - transmitted was a multiple of three. - - Any characters outside of the base64 alphabet are to be ignored in - base64-encoded data. The same applies to any illegal sequence of - characters in the base64 encoding, such as "=====" - - Care must be taken to use the proper octets for line breaks if base64 - encoding is applied directly to text material that has not been - converted to canonical form. In particular, text line breaks must be - converted into CRLF sequences prior to base64 encoding. The important - thing to note is that this may be done directly by the encoder rather - than in a prior canonicalization step in some implementations. - - NOTE: There is no need to worry about quoting apparent - encapsulation boundaries within base64-encoded parts of multipart - entities because no hyphen characters are used in the base64 - encoding. |