Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:
The server character set and collation can be determined
from the values of the
character_set_server
and
collation_server
system
variables.
The character set and collation of the default database can
be determined from the values of the
character_set_database
and
collation_database
system
variables.
Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.
Consider what a “connection” is: It is what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets or error messages, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:
What character set is the statement in when it leaves the client?
The server takes the
character_set_client
system
variable to be the character set in which statements are
sent by the client.
What character set should the server translate a statement to after receiving it?
For this, the server uses the
character_set_connection
and collation_connection
system variables. It converts statements sent by the client
from character_set_client
to character_set_connection
(except for string literals that have an introducer such as
_latin1
or _utf8
).
collation_connection
is
important for comparisons of literal strings. For
comparisons of strings with column values,
collation_connection
does
not matter because columns have their own collation, which
has a higher collation precedence.
What character set should the server translate to before shipping result sets or error messages back to the client?
The character_set_results
system variable indicates the character set in which the
server returns query results to the client. This includes
result data such as column values, and result metadata such
as column names and error messages.
Clients can fine-tune the settings for these variables, or depend on the defaults (in which case, you can skip the rest of this section). If you do not use the defaults, you must change the character settings for each connection to the server.
There are two statements that affect the connection-related character set variables as a group:
SET NAMES '
charset_name
'
[COLLATE
'collation_name
']
SET NAMES
indicates what character set
the client will use to send SQL statements to the server.
Thus, SET NAMES 'cp1251'
tells the
server, “future incoming messages from this client are
in character set cp1251
.” It also
specifies the character set that the server should use for
sending results back to the client. (For example, it
indicates what character set to use for column values if you
use a SELECT
statement.)
A SET NAMES
'
statement is
equivalent to these three statements:
x
'
SET character_set_client =x
; SET character_set_results =x
; SET character_set_connection =x
;
Setting each of these character set variables also sets its
corresponding collation variable to the default correlation
for the character set. For example, setting
character_set_connection
to
x
also sets
collation_connection
to the
default collation for x
. It is
not necessary to set that collation explicitly. To specify a
particular collation for the character sets, use the
optional COLLATE
clause:
SET NAMES 'charset_name
' COLLATE 'collation_name
'
SET CHARACTER SET
charset_name
SET CHARACTER SET
is similar to
SET NAMES
but sets
character_set_connection
and collation_connection
to
character_set_database
and
collation_database
. A
SET CHARACTER SET
statement is
equivalent to these three statements:
x
SET character_set_client =x
; SET character_set_results =x
; SET collation_connection = @@collation_database;
Setting
collation_connection
also
sets
character_set_connection
to
the character set associated with the collation (equivalent
to executing SET character_set_connection =
@@character_set_database
). It is not necessary to
set
character_set_connection
explicitly.
ucs2
, utf16
, and
utf32
cannot be used as a client character
set, which means that they do not work for SET
NAMES
or SET CHARACTER SET
.
The MySQL client programs mysql
,
mysqladmin
, mysqlcheck
,
mysqlimport
, and mysqlshow
determine the default character set to use as follows:
In the absence of other information, the programs use the
compiled-in default character set, usually
latin1
.
The programs can autodetect which character set to use based
on the operating system setting, such as the value of the
LANG
or LC_ALL
locale
environment variable on Unix systems or the code page
setting on Windows systems. For systems on which the locale
is available from the OS, the client uses it to set the
default character set rather than using the compiled-in
default. For example, setting LANG
to
ru_RU.KOI8-R
causes the
koi8r
character set to be used. Thus,
users can configure the locale in their environment for use
by MySQL clients.
The OS character set is mapped to the closest MySQL
character set if there is no exact match. If the client does
not support the matching character set, it uses the
compiled-in default. For example, ucs2
is
not supported as a connection character set.
C applications that wish to use character set autodetection
based on the OS setting can invoke the following
mysql_options()
call before
connecting to the server:
mysql_options(mysql, MYSQL_SET_CHARSET_NAME, MYSQL_AUTODETECT_CHARSET_NAME);
The programs support a
--default-character-set
option, which enables users to specify the character set
explicitly to override whatever default the client otherwise
determines.
When a client connects to the server, it sends the name of the
character set that it wants to use. The server uses the name to
set the character_set_client
,
character_set_results
, and
character_set_connection
system
variables. In effect, the server performs a SET
NAMES
operation using the character set name.
With the mysql client, if you want to use a
character set different from the default, you could explicitly
execute SET NAMES
every time you start up.
However, to accomplish the same result more easily, you can add
the --default-character-set
option
setting to your mysql command line or in your
option file. For example, the following option file setting
changes the three connection-related character set variables set
to koi8r
each time you invoke
mysql:
[mysql] default-character-set=koi8r
If you are using the mysql client with
auto-reconnect enabled (which is not recommended), it is
preferable to use the charset
command rather
than SET NAMES
. For example:
mysql> charset utf8
Charset changed
The charset
command issues a SET
NAMES
statement, and also changes the default
character set that mysql uses when it
reconnects after the connection has dropped.
Example: Suppose that column1
is defined as
CHAR(5) CHARACTER SET latin2
. If you do not
say SET NAMES
or SET CHARACTER
SET
, then for SELECT column1 FROM
t
, the server sends back all the values for
column1
using the character set that the
client specified when it connected. On the other hand, if you
say SET NAMES 'latin1'
or SET
CHARACTER SET latin1
before issuing the
SELECT
statement, the server
converts the latin2
values to
latin1
just before sending results back.
Conversion may be lossy if there are characters that are not in
both character sets.
If you do not want the server to perform any conversion of
result sets or error messages, set
character_set_results
to
NULL
or binary
:
SET character_set_results = NULL;
To see the values of the character set and collation system variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%'; SHOW VARIABLES LIKE 'collation%';
You must also consider the environment within which your MySQL applications execute. See Section 9.1.5, “Configuring the Character Set and Collation for Applications”.
For more information about character sets and error messages, see Section 9.1.6, “Character Set for Error Messages”.
User Comments
This example are usable for russian users who want to have windows-1251 encoding on the site and koi8-r encoding into the database:
set CHARACTER SET cp1251_koi8
If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that:
1. you are using mysqlimport to insert data from files with utf8 characters;
AND
2. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
3. You have created a Table with character set utf8.
Even if you use the --default-character-set=utf8 option for mysqlimport this doesn't work! Mysqlimport is only using the value of 'character_set_database' as character set and in this case it is 'latin1'.
This example shows how to configure mysqld server to use pure utf8 for server's character set and collation instead of default latin1. This will help to correctly store non-latin character data in db (cyrillic сharacters for example and russian language in particularly).
Go to the [mysqld] section in my.cnf and add two strings:
collation_server=utf8_unicode_ci
character_set_server=utf8
You can also add
skip-character-set-client-handshake
to enforce using of utf8 encoding in db.
I tried all the above hints but still couldn´t get it to work (using mySql and PHP 5 server)... ´till I found and tried also this:
add to the .htaccess file this single line:
AddDefaultCharset UTF-8
Still not sure why this was important to the whole, but now it works as planned! Thanks!
I've spent quite a lot of time trying to make MySQL 4.1 and PHP working with my tables in cp1251 charset.
Adding on my local PC the following line to the my.ini solved the problem but unfortunately I can't do the same on my hosting provider space
default-character-set=cp1251
So after trying quite a lot of combinations finally I've found that I need to add only one query set just after connecting to the database in my php script
mysql_query ('SET NAMES CP1251');
Please note if you run another query like
mysql_query ('SET CHARACTER SET CP1251');
just after first one it will reset character_set_connection option and all your symbols most likely will be converted to the question marks as it happens for me many times before.
In my process of adjusting the charset variables to correctly display the Chinese characters,I found client and server not only interact with each other, but also interact with windows OS,although I don't know how and when. When I use java application to write Chinese characters in utf8 charset to the database, even if I set all charset variables to utf8, the characters can not display correctly. Only after I set some charset to gb2312, the standard national Chinese charset, the display is right. I guess that is because the default charset of Windows OS in Chinese version is gb2312, and in between the interaction of client and server, Windows also make some charset translations in the process.
Thus when you still fail to display the correct characters after trying various combinations of charset variables, try the code below:
SET NAMES the_default_charset_of_your_local_OS
Hope this can do some help!
I have prepared a little hand-drawn diagram on this (while debugging problems with xplanner) which might be of some use. It's been scanned into a 850KB PDF:
http://public.m-plify.net/mysql/MySQL_Charset_Handling.pdf
Working in Windows environment, I had a problem submitting unicode data from ASP pages to the MySQL server while everything was set to utf8 .
It turns out the problem was that my ODBC driver was version 3.5.1 and that's what caused the problem. Installing version 5.1 solved the problem.
http://dev.mysql.com/downloads/connector/odbc/
Add your own comment.