Character code conversion involves conversion between the encoding used inside Emacs and some other encoding. Emacs supports many different encodings, in that it can convert to and from them. For example, it can convert text to or from encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some cases, Emacs supports several alternative encodings for the same characters; for example, there are three coding systems for the Cyrillic (Russian) alphabet: ISO, Alternativnyj, and KOI8.
Most coding systems specify a particular character code for conversion, but some of them leave this unspecified--to be chosen heuristically based on the data.
End of line conversion handles three different conventions used on various systems for representing end of line in files. The Unix convention is to use the linefeed character (also called newline). The DOS convention is to use the two character sequence, carriage-return linefeed, at the end of a line. The Mac convention is to use just carriage-return.
Base coding systems such as latin-1
leave the end-of-line
conversion unspecified, to be chosen based on the data. Variant
coding systems such as latin-1-unix
, latin-1-dos
and
latin-1-mac
specify the end-of-line conversion explicitly as
well. Most base coding systems have three corresponding variants whose
names are formed by adding `-unix', `-dos' and `-mac'.
The coding system raw-text
is special in that it prevents
character code conversion, and causes the buffer visited with that
coding system to be a unibyte buffer. It does not specify the
end-of-line conversion, allowing that to be determined as usual by the
data, and has the usual three variants which specify the end-of-line
conversion. no-conversion
is equivalent to raw-text-unix
:
it specifies no conversion of either character codes or end-of-line.
The coding system emacs-mule
specifies that the data is
represented in the internal Emacs encoding. This is like
raw-text
in that no code conversion happens, but different in
that the result is multibyte data.
mime-charset
.
That property's value is the name used in MIME for the character coding
which this coding system can read and write. Examples:
(coding-system-get 'iso-latin-1 'mime-charset) => iso-8859-1 (coding-system-get 'iso-2022-cn 'mime-charset) => iso-2022-cn (coding-system-get 'cyrillic-koi8 'mime-charset) => koi8-r
The value of the mime-charset
property is also defined
as an alias for the coding system.
Go to the first, previous, next, last section, table of contents.