phpbar.de logo

Mailinglisten-Archive

Re: y==ü im Schwedischen?
Archiv Mailingliste mysql-de

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: y==ü im Schwedischen?



Tobias Ratschiller schrieb:
> Huch! *Erschreck*:
> SQL-query INSERT INTO test (id, gg, ff, ee) VALUES ('mür', '', '', '')
> produced an error.
> MySQL said: Duplicate entry 'mür' for key 1
> 
> Das ist ein Fall für Monty.

Oder zumindest für eine verbesserte Dokumentation ...

Inzwischen hab' ich genauere Tests gemacht und durchblicke das jetzt
etwas besser.  Auch mit meiner Configure-Option "--with-charset=german1"
werden viele Zeichen bzgl. Sortierung und Duplikaterkennung als gleich
angesehen, nämlich:
     +------+------+------+------+
     | t1   | c1   | t2   | c2   |
     +------+------+------+------+
     |    0 |      |  160 |      |
     |   97 | a    |  224 | à    |
     |   97 | a    |  225 | á    |
     |   97 | a    |  226 | â    |
     |   97 | a    |  227 | ã    |
     |   97 | a    |  228 | ä    |
     |   97 | a    |  229 | å    |
     |   97 | a    |  230 | æ    |
     |   99 | c    |  231 | ç    |
     |  101 | e    |  232 | è    |
     |  101 | e    |  233 | é    |
     |  101 | e    |  234 | ê    |
     |  101 | e    |  235 | ë    |
     |  105 | i    |  236 | ì    |
     |  105 | i    |  237 | í    |
     |  105 | i    |  238 | î    |
     |  105 | i    |  239 | ï    |
     |  100 | d    |  240 | ð    |
     |  110 | n    |  241 | ñ    |
     |  111 | o    |  242 | ò    |
     |  111 | o    |  243 | ó    |
     |  111 | o    |  244 | ô    |
     |  111 | o    |  245 | õ    |
     |  111 | o    |  246 | ö    |
     |  117 | u    |  249 | ù    |
     |  117 | u    |  250 | ú    |
     |  117 | u    |  251 | û    |
     |  117 | u    |  252 | ü    |
     |  121 | y    |  253 | ý    |
     +------+------+------+------+
Auch Groß- und Kleinschreibung wird nicht unterschieden, aber das habe
der Übersicht halber aus der Tabelle weggelassen.

Ich habe zum Test eine 2-spaltige Tabelle mit allen möglichen
Zeichenpaaren angelegt und aus dieser die Paare herausgeholt, die
MySQL als gleich ansieht.  Mit dem angehängten perl-Skript könnt
Ihr den SQL-Code zum Test erzeugen und es selbst ausprobieren.

   perl dupltest.pl | mysql test


Nach intensiven kaffeegestütze Handbuchrecherchen fand ich aber doch
eine Lösung:  für Textspalten gibt's das Attribut "BINARY"!

Hier gleich die passenden Passagen zitiert (Manual zu V3.22.19b):

| 1.4 The main features of MySQL
| [...]
|  * All data are saved in ISO-8859-1 Latin1 format. All comparisons for
|    normal string columns are case insensitive.
|  * Sorting is done according to the ISO-8859-1 Latin1 character set
|    (the Swedish way at the moment). It is possible to change this in
|    the source by adding new sort order arrays. To see an example of
|    very advanced sorting, look at the Czech sorting code. MySQL
|    supports many different character sets that can be specified at
|    compile time.

| 5.1 MySQL extensions to ANSI SQL92
| [...]
|  * All string comparisons are case insensitive by default, with
|    sort ordering determined by the current character set (ISO-8859-1
|    Latin1 by default). If you don't like this, you should declare your
|    columns with the BINARY attribute, which causes comparisons to be
|    done according to the ASCII order used on the MySQL server host.

| 4.7.3 Typical configure options
| [...]
|  * By default, MySQL uses the ISO-8859-1 (Latin1) character set. [...]
|    CHARSET may be one of big5, czech, danish, dec8, dos, german1,
|    hebrew, hp8, hungarian, koi8_ru, ru, latin1, latin2, sjis, swe7,
|    tis620, ujis, usa7 or win1251. [...]  Note that if you want to
|    change the character set, you must do a make distclean between
|    configurations!

| 9.1.1 The character set used for data and sorting
| 
| By default, MySQL uses the ISO-8859-1 (Latin1) character set. This is
| the character set used in the USA and western Europe.
|   The character set determines what characters are allowed in names
| and how things are sorted by the ORDER BY and GROUP BY clauses of the
| SELECT statement.
|   You can change the character set at compile time by using the
| --with-charset=charset option to configure. [...]

| 16.13 16.13 Case sensitivity in searches
| 
| By default, MySQL searches are case-insensitive (although there are
| some character sets that are never case insensitive, such as
| czech). That means that if you search with col_name LIKE 'a%', you
| will get all column values that start with A or a. If you want to make
| this search case-sensitive, use something like INDEX(col_name, "A")=0
| to check a prefix. Or use STRCMP(col_name, "A") = 0 if the column
| value must be exactly "A".
| 
| Simple comparison operations (>=, >, = , < , <=, sorting and grouping)
| are based on each character's "sort value". Characters with the same
| sort value (like E, e and 'e) are treated as the same character!
| 
| LIKE comparisons are done on the uppercase value of each character
| (E == e but E <> 'e)
| 
| If you want a column always to be treated in case-sensitive fashion,
| declare it as BINARY.

Offensichtlich ein Fall für RTFM. :-)
Aber da die Hinweise im Manual sooo verstreut liegen, verzeiht Ihr
mir hoffentlich?

Daß sich die Sortierreihenfolge aber auch auf die Duplicat-Erkennung
auswirkt, scheint nirgends erwähnt zu sein, so daß hier aus meiner
Sicht offen ist, ob das nur ein schlecht dokumentiertes Feature oder
ein Bug ist.

Ciao,
  Martin
-- 
Martin Ramsch <m.ramsch_(at)_computer.org> <URL: http://home.pages.de/~ramsch/ >
PGP KeyID=0xE8EF4F75 FiPr=52 44 5E F3 B0 B1 38 26  E4 EC 80 58 7B 31 3A D7

dupltest.pl


Home | Main Index | Thread Index

php::bar PHP Wiki   -   Listenarchive