Here is the content of article
Thank you! Natalka Roshak.
To get us started, here's a simple sample table with a few rows in it.
SQL> select * from invoice_demo order by invoice_id SQL> / INVOICE_ID SUPPLIER_NAME ---------- ------------------------------ 1000 Max Books 1001 MAX BOOKS 1002 max books 1003 janus pet supply
The default: Binary sort
The default sort order is always binary, meaning that the characters are sorted in the order of their character number in the character set (aka code point or encoded value). For example, compare the sort order on SUPPLIER_NAME with the character number of the first character in the field:SQL> select invoice_id, supplier_name, ascii(substr(supplier_name,1,1)) 2 from invoice_demo 3 order by supplier_name SQL> / INVOICE_ID SUPPLIER_NAME ASCII(SUBSTR(SUPPLIER_NAME,1,1)) ---------- -------------------- -------------------------------- 1001 MAX BOOKS 77 1000 Max Books 77 1003 janus pet supply 106 1002 max books 109 7 rows selected.
The old way: NLS_COMP=ANSI
In all versions since at least 9i, you can do a case insensitive sort by setting NLS_COMP to ANSI and choosing a case insensitive sort order. (The default setting for NLS_COMP is BINARY, which produces the binary sort we saw in the first query.)Choosing a case-insensitive sort order is easy. You can take any of the Oracle-defined linguistic sort orders - the default is BINARY - and append "_CI" to it. For example:
SQL> alter session set NLS_COMP='ANSI' ; Session altered. SQL> alter session set NLS_SORT='BINARY_CI' ; Session altered. SQL> select * from invoice_demo order by supplier_name ; INVOICE_ID SUPPLIER_NAME ---------- ------------------------------ 1003 janus pet supply 1001 MAX BOOKS 1000 Max Books 1002 max books
SQL> select * from invoice_demo 2 where supplier_name='MAX BOOKS'; INVOICE_ID SUPPLIER_NAME ---------- ------------------------------ 1000 Max Books 1001 MAX BOOKS 1002 max books SQL> select * from invoice_demo where supplier_name like 'M%'; INVOICE_ID SUPPLIER_NAME ---------- ------------------------------ 1000 Max Books 1001 MAX BOOKS
SQL> select min(supplier_name) from invoice_demo ; MIN(SUPPLIER_NAME) ------------------------------ MAX BOOKS SQL> select distinct (supplier_name) from invoice_demo 2 where substr(supplier_name,1,1)='M' ; SUPPLIER_NAME ------------------------------ Max Books max books MAX BOOKS
The 10gR2 solution: NLS_COMP=LINGUISTIC
10gR2 introduced a new value for NLS_COMP which extends the NLS_SORT sort order to cover all SQL sorts and comparisons. Here are the results of the same queries as above, with NLS_COMP set to LINGUISTIC.SQL> alter session set nls_comp='LINGUISTIC'; Session altered. SQL> select * from invoice_demo 2 where supplier_name like 'M%'; INVOICE_ID SUPPLIER_NAME ---------- ------------------------------ 1000 Max Books 1001 MAX BOOKS 1002 max books SQL> select min(supplier_name) from invoice_demo ; MIN(SUPPLIER_NAME) ------------------------------------------ janus pet supply SQL> select distinct (supplier_name) from invoice_demo 2 where substr(supplier_name,1,1)='M' ; SUPPLIER_NAME ------------------------------ Max Books
With NLS_COMP set to LINGUISTIC, every SQL sort and comparison uses the NLS_SORT linguistic sort order. So, if you have set your sort order to a case insensitive one by appending _CI, all sorts and comparisons in that session will be case insensitive. Very handy!
Mixing Case-Sensitive Searches with Case-Insensitive Ones
You might want to mix case-sensitive and case-insensitive behaviour in the same query. For example, I might want to group case-sensitively and sort case-insensitively. First I'll add a bit more data to the table to make this example clearer:SQL> select * from invoice_demo 2 where supplier_name like 'm%' 3 order by invoice_id INVOICE_ID SUPPLIER_NAME ---------- ------------------------------- 1000 Max Books 1001 MAX BOOKS 1002 max books 1010 MAX BOOKS 1011 max books
SQL> select invoice_id, supplier_name, 2 nlssort(supplier_name,'NLS_SORT=BINARY'), 3 nlssort(supplier_name,'NLS_SORT=BINARY_CI') 4 from invoice_demo 5 where supplier_name like 'm%' 6 order by invoice_id SQL> / INVOICE_ID SUPPLIER_NAME NLSSORT(SUPPLIER_NAME,'NLS_SORT=BIN NLSSORT(SUPPLIER_NAME,'NLS_SORT=BIN ---------- -------------------- ----------------------------------- ----------------------------------- 1000 Max Books 4D617820426F6F6B7300 6D617820626F6F6B7300 1001 MAX BOOKS 4D415820424F4F4B5300 6D617820626F6F6B7300 1002 max books 6D617820626F6F6B7300 6D617820626F6F6B7300 1010 MAX BOOKS 4D415820424F4F4B5300 6D617820626F6F6B7300 1011 max books 6D617820626F6F6B7300 6D617820626F6F6B7300 5 rows selected.
SQL> alter session set nls_comp='binary' ; Session altered. SQL> alter session set nls_sort='binary' ; Session altered. SQL> select supplier_name, count(*) 2 from invoice_demo 3 group by supplier_name 4 order by nlssort(supplier_name,'NLS_SORT=BINARY_CI') ; SUPPLIER_NAME COUNT(*) -------------------- ---------- janus pet supply 1 MAX BOOKS 2 max books 2 Max Books 1 4 rows selected.
Case Insensitive Indexes
A normal index uses the default sort order, by default, and so it's useless in a case-insensitive search - and Oracle won't use it. For large tables, the resulting full table scan can be quite a performance penalty. Fortunately, it's easy to create an index that uses a specific sort order. You simply create a function-based index that uses the NLSSORT function we saw above.SQL> create index ind_idemo_sname_ci 2 on invoice_demo (nlssort(supplier_name, 'NLS_SORT=BINARY_CI')) SQL> / Index created.
The main thing to watch out for here is that because this is a function-based index, it is subject to all the restrictions on function-based indexes. For example, Oracle will not use the index if it's theoretically possible for the result set to include NULLS in the indexed column. Oracle will only use the index if either (1) the column that the index is on is declared as NOT NULL, or (2) you have a where clause in the query that specifies that the function on the indexed column return NOT NULL, such as WHERE NLSSORT(SUPPLIER_NAME,'NLS_SORT=BINARY_CI') IS NOT NULL .
Accent Insensitivity
A related feature is what one might call "accent insensitivity," or the ability to ignore diacritics in the sort order. For example, you might want "à la mode" (note the accent over the "a") and "a la mode" (no accent) to sort to the same position. Just as you can append _CI to any of Oracle's linguistic sort orders, you can append _AI to get case insensitivity and accent insensitivity.For this example, we'll need some data with accented characters:
INVOICE_ID SUPPLIER_NAME ---------- -------------------- 2002 École Lyonnaise 2000 ecole lyonnaise 2001 école lyonnaise
SQL> alter session set nls_sort='BINARY'; Session altered. SQL> select supplier_name from invoice_demo order by supplier_name ; SUPPLIER_NAME -------------------- MAX BOOKS MAX BOOKS Max Books ecole lyonnaise janus pet supply max books max books École Lyonnaise école lyonnaise 9 rows selected.
SQL> alter session set nls_sort='BINARY_CI' ; Session altered. SQL> select supplier_name 2 from invoice_demo order by supplier_name ; SUPPLIER_NAME -------------------- ecole lyonnaise janus pet supply Max Books max books max books MAX BOOKS MAX BOOKS École Lyonnaise école lyonnaise 9 rows selected.
SQL> alter session set nls_sort='BINARY_AI' ; Session altered. SQL> select supplier_name 2 from invoice_demo order by supplier_name ; SUPPLIER_NAME -------------------- ecole lyonnaise École Lyonnaise école lyonnaise janus pet supply max books Max Books max books MAX BOOKS MAX BOOKS 9 rows selected.
Other sort orders
The new NLS_COMP=LINGUISTIC setting doesn't just apply to case-insensitive and accent-insensitive binary searches. Any of Oracle's linguistic sort orders can be used with it. For example, you can choose NLS_SORT=XSPANISH to force all SQL to use traditional Spanish sort and comparison semantics, in which "ch" and "ll" are both treated as one character - so "llaves" does not meet the condition "like 'l%' ", for example. And you can apply the _CI and _AI suffixes to any linguistic sort order, eg. NLS_SORT=XSPANISH_CI .For more information
Refer to Chapter 5 of the Oracle 10gR2 Database Globalization Support Guide.About the author
Natalka Roshak is a senior Oracle and Sybase database administrator, analyst, and architect. She is based in Kingston, Ontario, and consults across North America. More of her scripts and tips can be found in her online DBA toolkit at http://toolkit.rdbms-insight.com/.Refer: http://www.orafaq.com/node/999
No comments:
Post a Comment