aboutsummaryrefslogtreecommitdiff
path: root/doc/base64.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/base64.txt')
-rw-r--r--doc/base64.txt107
1 files changed, 107 insertions, 0 deletions
diff --git a/doc/base64.txt b/doc/base64.txt
new file mode 100644
index 00000000..48eafabe
--- /dev/null
+++ b/doc/base64.txt
@@ -0,0 +1,107 @@
+ The Base64 Content-Transfer-Encoding is designed to represent
+ arbitrary sequences of octets in a form that need not be humanly
+ readable. The encoding and decoding algorithms are simple, but the
+ encoded data are consistently only about 33 percent larger than the
+ unencoded data. This encoding is virtually identical to the one used
+ in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
+ The base64 encoding is adapted from RFC 1421, with one change: base64
+ eliminates the "*" mechanism for embedded clear text.
+
+ A 65-character subset of US-ASCII is used, enabling 6 bits to be
+ represented per printable character. (The extra 65th character, "=",
+ is used to signify a special processing function.)
+
+ NOTE: This subset has the important property that it is
+ represented identically in all versions of ISO 646, including US
+ ASCII, and all characters in the subset are also represented
+ identically in all versions of EBCDIC. Other popular encodings,
+ such as the encoding used by the uuencode utility and the base85
+ encoding specified as part of Level 2 PostScript, do not share
+ these properties, and thus do not fulfill the portability
+ requirements a binary transport encoding for mail must meet.
+
+ The encoding process represents 24-bit groups of input bits as output
+ strings of 4 encoded characters. Proceeding from left to right, a
+ 24-bit input group is formed by concatenating 3 8-bit input groups.
+ These 24 bits are then treated as 4 concatenated 6-bit groups, each
+ of which is translated into a single digit in the base64 alphabet.
+ When encoding a bit stream via the base64 encoding, the bit stream
+ must be presumed to be ordered with the most-significant-bit first.
+
+ That is, the first bit in the stream will be the high-order bit in
+ the first byte, and the eighth bit will be the low-order bit in the
+ first byte, and so on.
+
+ Each 6-bit group is used as an index into an array of 64 printable
+ characters. The character referenced by the index is placed in the
+ output string. These characters, identified in Table 1, below, are
+ selected so as to be universally representable, and the set excludes
+ characters with particular significance to SMTP (e.g., ".", CR, LF)
+ and to the encapsulation boundaries defined in this document (e.g.,
+ "-").
+
+ Table 1: The Base64 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 A 17 R 34 i 51 z
+ 1 B 18 S 35 j 52 0
+ 2 C 19 T 36 k 53 1
+ 3 D 20 U 37 l 54 2
+ 4 E 21 V 38 m 55 3
+ 5 F 22 W 39 n 56 4
+ 6 G 23 X 40 o 57 5
+ 7 H 24 Y 41 p 58 6
+ 8 I 25 Z 42 q 59 7
+ 9 J 26 a 43 r 60 8
+ 10 K 27 b 44 s 61 9
+ 11 L 28 c 45 t 62 +
+ 12 M 29 d 46 u 63 /
+ 13 N 30 e 47 v
+ 14 O 31 f 48 w (pad) =
+ 15 P 32 g 49 x
+ 16 Q 33 h 50 y
+ The output stream (encoded bytes) must be represented in lines of no
+ more than 76 characters each. All line breaks or other characters
+ not found in Table 1 must be ignored by decoding software. In base64
+ data, characters other than those in Table 1, line breaks, and other
+ white space probably indicate a transmission error, about which a
+ warning message or even a message rejection might be appropriate
+ under some circumstances.
+
+ Special processing is performed if fewer than 24 bits are available
+ at the end of the data being encoded. A full encoding quantum is
+ always completed at the end of a body. When fewer than 24 input bits
+ are available in an input group, zero bits are added (on the right)
+ to form an integral number of 6-bit groups. Padding at the end of
+ the data is performed using the '=' character. Since all base64
+ input is an integral number of octets, only the following cases can
+arise: (1) the final quantum of encoding input is an integral
+ multiple of 24 bits; here, the final unit of encoded output will be
+ an integral multiple of 4 characters with no "=" padding, (2) the
+ final quantum of encoding input is exactly 8 bits; here, the final
+ unit of encoded output will be two characters followed by two "="
+ padding characters, or (3) the final quantum of encoding input is
+ exactly 16 bits; here, the final unit of encoded output will be three
+ characters followed by one "=" padding character.
+
+ Because it is used only for padding at the end of the data, the
+ occurrence of any '=' characters may be taken as evidence that the
+ end of the data has been reached (without truncation in transit). No
+ such assurance is possible, however, when the number of octets
+ transmitted was a multiple of three.
+
+ Any characters outside of the base64 alphabet are to be ignored in
+ base64-encoded data. The same applies to any illegal sequence of
+ characters in the base64 encoding, such as "====="
+
+ Care must be taken to use the proper octets for line breaks if base64
+ encoding is applied directly to text material that has not been
+ converted to canonical form. In particular, text line breaks must be
+ converted into CRLF sequences prior to base64 encoding. The important
+ thing to note is that this may be done directly by the encoder rather
+ than in a prior canonicalization step in some implementations.
+
+ NOTE: There is no need to worry about quoting apparent
+ encapsulation boundaries within base64-encoded parts of multipart
+ entities because no hyphen characters are used in the base64
+ encoding.