Mailinglisten-Archive |
Tobias Ratschiller schrieb: > Huch! *Erschreck*: > SQL-query INSERT INTO test (id, gg, ff, ee) VALUES ('mür', '', '', '') > produced an error. > MySQL said: Duplicate entry 'mür' for key 1 > > Das ist ein Fall für Monty. Oder zumindest für eine verbesserte Dokumentation ... Inzwischen hab' ich genauere Tests gemacht und durchblicke das jetzt etwas besser. Auch mit meiner Configure-Option "--with-charset=german1" werden viele Zeichen bzgl. Sortierung und Duplikaterkennung als gleich angesehen, nämlich: +------+------+------+------+ | t1 | c1 | t2 | c2 | +------+------+------+------+ | 0 | | 160 | | | 97 | a | 224 | à | | 97 | a | 225 | á | | 97 | a | 226 | â | | 97 | a | 227 | ã | | 97 | a | 228 | ä | | 97 | a | 229 | å | | 97 | a | 230 | æ | | 99 | c | 231 | ç | | 101 | e | 232 | è | | 101 | e | 233 | é | | 101 | e | 234 | ê | | 101 | e | 235 | ë | | 105 | i | 236 | ì | | 105 | i | 237 | í | | 105 | i | 238 | î | | 105 | i | 239 | ï | | 100 | d | 240 | ð | | 110 | n | 241 | ñ | | 111 | o | 242 | ò | | 111 | o | 243 | ó | | 111 | o | 244 | ô | | 111 | o | 245 | õ | | 111 | o | 246 | ö | | 117 | u | 249 | ù | | 117 | u | 250 | ú | | 117 | u | 251 | û | | 117 | u | 252 | ü | | 121 | y | 253 | ý | +------+------+------+------+ Auch Groß- und Kleinschreibung wird nicht unterschieden, aber das habe der Übersicht halber aus der Tabelle weggelassen. Ich habe zum Test eine 2-spaltige Tabelle mit allen möglichen Zeichenpaaren angelegt und aus dieser die Paare herausgeholt, die MySQL als gleich ansieht. Mit dem angehängten perl-Skript könnt Ihr den SQL-Code zum Test erzeugen und es selbst ausprobieren. perl dupltest.pl | mysql test Nach intensiven kaffeegestütze Handbuchrecherchen fand ich aber doch eine Lösung: für Textspalten gibt's das Attribut "BINARY"! Hier gleich die passenden Passagen zitiert (Manual zu V3.22.19b): | 1.4 The main features of MySQL | [...] | * All data are saved in ISO-8859-1 Latin1 format. All comparisons for | normal string columns are case insensitive. | * Sorting is done according to the ISO-8859-1 Latin1 character set | (the Swedish way at the moment). It is possible to change this in | the source by adding new sort order arrays. To see an example of | very advanced sorting, look at the Czech sorting code. MySQL | supports many different character sets that can be specified at | compile time. | 5.1 MySQL extensions to ANSI SQL92 | [...] | * All string comparisons are case insensitive by default, with | sort ordering determined by the current character set (ISO-8859-1 | Latin1 by default). If you don't like this, you should declare your | columns with the BINARY attribute, which causes comparisons to be | done according to the ASCII order used on the MySQL server host. | 4.7.3 Typical configure options | [...] | * By default, MySQL uses the ISO-8859-1 (Latin1) character set. [...] | CHARSET may be one of big5, czech, danish, dec8, dos, german1, | hebrew, hp8, hungarian, koi8_ru, ru, latin1, latin2, sjis, swe7, | tis620, ujis, usa7 or win1251. [...] Note that if you want to | change the character set, you must do a make distclean between | configurations! | 9.1.1 The character set used for data and sorting | | By default, MySQL uses the ISO-8859-1 (Latin1) character set. This is | the character set used in the USA and western Europe. | The character set determines what characters are allowed in names | and how things are sorted by the ORDER BY and GROUP BY clauses of the | SELECT statement. | You can change the character set at compile time by using the | --with-charset=charset option to configure. [...] | 16.13 16.13 Case sensitivity in searches | | By default, MySQL searches are case-insensitive (although there are | some character sets that are never case insensitive, such as | czech). That means that if you search with col_name LIKE 'a%', you | will get all column values that start with A or a. If you want to make | this search case-sensitive, use something like INDEX(col_name, "A")=0 | to check a prefix. Or use STRCMP(col_name, "A") = 0 if the column | value must be exactly "A". | | Simple comparison operations (>=, >, = , < , <=, sorting and grouping) | are based on each character's "sort value". Characters with the same | sort value (like E, e and 'e) are treated as the same character! | | LIKE comparisons are done on the uppercase value of each character | (E == e but E <> 'e) | | If you want a column always to be treated in case-sensitive fashion, | declare it as BINARY. Offensichtlich ein Fall für RTFM. :-) Aber da die Hinweise im Manual sooo verstreut liegen, verzeiht Ihr mir hoffentlich? Daß sich die Sortierreihenfolge aber auch auf die Duplicat-Erkennung auswirkt, scheint nirgends erwähnt zu sein, so daß hier aus meiner Sicht offen ist, ob das nur ein schlecht dokumentiertes Feature oder ein Bug ist. Ciao, Martin -- Martin Ramsch <m.ramsch_(at)_computer.org> <URL: http://home.pages.de/~ramsch/ > PGP KeyID=0xE8EF4F75 FiPr=52 44 5E F3 B0 B1 38 26 E4 EC 80 58 7B 31 3A D7
php::bar PHP Wiki - Listenarchive