Mailinglisten-Archive |
Tobias Ratschiller schrieb:
> Huch! *Erschreck*:
> SQL-query INSERT INTO test (id, gg, ff, ee) VALUES ('mür', '', '', '')
> produced an error.
> MySQL said: Duplicate entry 'mür' for key 1
>
> Das ist ein Fall für Monty.
Oder zumindest für eine verbesserte Dokumentation ...
Inzwischen hab' ich genauere Tests gemacht und durchblicke das jetzt
etwas besser. Auch mit meiner Configure-Option "--with-charset=german1"
werden viele Zeichen bzgl. Sortierung und Duplikaterkennung als gleich
angesehen, nämlich:
+------+------+------+------+
| t1 | c1 | t2 | c2 |
+------+------+------+------+
| 0 | | 160 | |
| 97 | a | 224 | à |
| 97 | a | 225 | á |
| 97 | a | 226 | â |
| 97 | a | 227 | ã |
| 97 | a | 228 | ä |
| 97 | a | 229 | å |
| 97 | a | 230 | æ |
| 99 | c | 231 | ç |
| 101 | e | 232 | è |
| 101 | e | 233 | é |
| 101 | e | 234 | ê |
| 101 | e | 235 | ë |
| 105 | i | 236 | ì |
| 105 | i | 237 | í |
| 105 | i | 238 | î |
| 105 | i | 239 | ï |
| 100 | d | 240 | ð |
| 110 | n | 241 | ñ |
| 111 | o | 242 | ò |
| 111 | o | 243 | ó |
| 111 | o | 244 | ô |
| 111 | o | 245 | õ |
| 111 | o | 246 | ö |
| 117 | u | 249 | ù |
| 117 | u | 250 | ú |
| 117 | u | 251 | û |
| 117 | u | 252 | ü |
| 121 | y | 253 | ý |
+------+------+------+------+
Auch Groß- und Kleinschreibung wird nicht unterschieden, aber das habe
der Übersicht halber aus der Tabelle weggelassen.
Ich habe zum Test eine 2-spaltige Tabelle mit allen möglichen
Zeichenpaaren angelegt und aus dieser die Paare herausgeholt, die
MySQL als gleich ansieht. Mit dem angehängten perl-Skript könnt
Ihr den SQL-Code zum Test erzeugen und es selbst ausprobieren.
perl dupltest.pl | mysql test
Nach intensiven kaffeegestütze Handbuchrecherchen fand ich aber doch
eine Lösung: für Textspalten gibt's das Attribut "BINARY"!
Hier gleich die passenden Passagen zitiert (Manual zu V3.22.19b):
| 1.4 The main features of MySQL
| [...]
| * All data are saved in ISO-8859-1 Latin1 format. All comparisons for
| normal string columns are case insensitive.
| * Sorting is done according to the ISO-8859-1 Latin1 character set
| (the Swedish way at the moment). It is possible to change this in
| the source by adding new sort order arrays. To see an example of
| very advanced sorting, look at the Czech sorting code. MySQL
| supports many different character sets that can be specified at
| compile time.
| 5.1 MySQL extensions to ANSI SQL92
| [...]
| * All string comparisons are case insensitive by default, with
| sort ordering determined by the current character set (ISO-8859-1
| Latin1 by default). If you don't like this, you should declare your
| columns with the BINARY attribute, which causes comparisons to be
| done according to the ASCII order used on the MySQL server host.
| 4.7.3 Typical configure options
| [...]
| * By default, MySQL uses the ISO-8859-1 (Latin1) character set. [...]
| CHARSET may be one of big5, czech, danish, dec8, dos, german1,
| hebrew, hp8, hungarian, koi8_ru, ru, latin1, latin2, sjis, swe7,
| tis620, ujis, usa7 or win1251. [...] Note that if you want to
| change the character set, you must do a make distclean between
| configurations!
| 9.1.1 The character set used for data and sorting
|
| By default, MySQL uses the ISO-8859-1 (Latin1) character set. This is
| the character set used in the USA and western Europe.
| The character set determines what characters are allowed in names
| and how things are sorted by the ORDER BY and GROUP BY clauses of the
| SELECT statement.
| You can change the character set at compile time by using the
| --with-charset=charset option to configure. [...]
| 16.13 16.13 Case sensitivity in searches
|
| By default, MySQL searches are case-insensitive (although there are
| some character sets that are never case insensitive, such as
| czech). That means that if you search with col_name LIKE 'a%', you
| will get all column values that start with A or a. If you want to make
| this search case-sensitive, use something like INDEX(col_name, "A")=0
| to check a prefix. Or use STRCMP(col_name, "A") = 0 if the column
| value must be exactly "A".
|
| Simple comparison operations (>=, >, = , < , <=, sorting and grouping)
| are based on each character's "sort value". Characters with the same
| sort value (like E, e and 'e) are treated as the same character!
|
| LIKE comparisons are done on the uppercase value of each character
| (E == e but E <> 'e)
|
| If you want a column always to be treated in case-sensitive fashion,
| declare it as BINARY.
Offensichtlich ein Fall für RTFM. :-)
Aber da die Hinweise im Manual sooo verstreut liegen, verzeiht Ihr
mir hoffentlich?
Daß sich die Sortierreihenfolge aber auch auf die Duplicat-Erkennung
auswirkt, scheint nirgends erwähnt zu sein, so daß hier aus meiner
Sicht offen ist, ob das nur ein schlecht dokumentiertes Feature oder
ein Bug ist.
Ciao,
Martin
--
Martin Ramsch <m.ramsch_(at)_computer.org> <URL: http://home.pages.de/~ramsch/ >
PGP KeyID=0xE8EF4F75 FiPr=52 44 5E F3 B0 B1 38 26 E4 EC 80 58 7B 31 3A D7
php::bar PHP Wiki - Listenarchive