Node:Generic Conversion Interface, Next:iconv Examples, Up:Generic Charset Conversion
This set of functions follows the traditional cycle of using a resource: open-use-close. The interface consists of three functions, each of which implements one step.
Before the interfaces are described it is necessary to introduce a
data type. Just like other open-use-close interfaces the functions
introduced here work using handles and the iconv.h
header
defines a special type for the handles used.
iconv_t | Data Type |
This data type is an abstract type defined in iconv.h . The user
must not assume anything about the definition of this type; it must be
completely opaque.
Objects of this type can get assigned handles for the conversions using
the |
The first step is the function to create a handle.
iconv_t iconv_open (const char *tocode, const char *fromcode) | Function |
The iconv_open function has to be used before starting a
conversion. The two parameters this function takes determine the
source and destination character set for the conversion, and if the
implementation has the possibility to perform such a conversion, the
function returns a handle.
If the wanted conversion is not available, the
It is not possible to use the same descriptor in different threads to perform independent conversions. The data structures associated with the descriptor include information about the conversion state. This must not be messed up by using it in different conversions. An The GNU C library implementation of The |
The iconv
implementation can associate large data structure with
the handle returned by iconv_open
. Therefore, it is crucial to
free all the resources once all conversions are carried out and the
conversion is not needed anymore.
int iconv_close (iconv_t cd) | Function |
The iconv_close function frees all resources associated with the
handle cd, which must have been returned by a successful call to
the iconv_open function.
If the function call was successful the return value is 0.
Otherwise it is -1 and
The |
The standard defines only one actual conversion function. This has, therefore, the most general interface: it allows conversion from one buffer to another. Conversion from a file to a buffer, vice versa, or even file to file can be implemented on top of it.
size_t iconv (iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft) | Function |
The iconv function converts the text in the input buffer
according to the rules associated with the descriptor cd and
stores the result in the output buffer. It is possible to call the
function for the same text several times in a row since for stateful
character sets the necessary state information is kept in the data
structures associated with the descriptor.
The input buffer is specified by The output buffer is specified in a similar way. If inbuf is a null pointer, the The conversion stops for one of three reasons. The first is that all characters from the input buffer are converted. This actually can mean two things: either all bytes from the input buffer are consumed or there are some bytes at the end of the buffer that possibly can form a complete character but the input is incomplete. The second reason for a stop is that the output buffer is full. And the third reason is that the input contains invalid characters. In all of these cases the buffer pointers after the last successful conversion, for input and output buffer, are stored in inbuf and outbuf, and the available room in each buffer is stored in inbytesleft and outbytesleft. Since the character sets selected in the If all input from the input buffer is successfully converted and stored
in the output buffer, the function returns the number of non-reversible
conversions performed. In all other cases the return value is
The |
The definition of the iconv
function is quite good overall. It
provides quite flexible functionality. The only problems lie in the
boundary cases, which are incomplete byte sequences at the end of the
input buffer and invalid input. A third problem, which is not really
a design problem, is the way conversions are selected. The standard
does not say anything about the legitimate names, a minimal set of
available conversions. We will see how this negatively impacts other
implementations, as demonstrated below.