Advance Text Searching

From MidrangeWiki
Jump to: navigation, search


The Problem

  • You've got a 100,000 row DB table with a 50 byte character column containing a product description...
  • You've got a 1 billion row DB table with a 5000 byte variable character column containing customer notes...
  • You've got a table with a 4MB CLOB column containing a .TXT,.RTF, ect. document...
  • You've got a .TXT, .RTF, ect. document stored externally in a stream file...

And your users need to find records/documents that contain the word "mouse".

For at least problems 1-3 above, you could try a standard SQL search:

SELECT myKey, myColumn
  FROM mytable
 WHERE myColumn like '%mouse%'

But your users will complain that it's too slow, that it doesn't find records containing "mice", and by default it won't find "Mouse" or "MOUSE", ect.

The reason the search is slow, is that the RDBMS will need to scan every record in your table. Creating an index, which is the usual solution to full tables scans won't be of much help; as the system would still need to at minimum scan the entire index to find records that contain the word "mouse" anyplace in the column.

Trying to use RPG's %SCAN() op-code, regular expressions or other technique that uses byte-level matching is going to have the same results.


The Solution

If you're running v6.1, v7.1 (or higher) of IBM i, you need to install OmniFind. An optional, no-charge, component of the OS at those releases.

If you're running v5r4 or earlier, there's an optional, chargeable component you can install: 5722-DE1 DB2 Text Extender.


OmniFind vs. DB2 Text Extender

OmniFind and the DB2 Text Extender are similar but different products. In fact, the DB2 Text Extender product is still available at v6.1 and v7.1. Both products work in a similar manner when it comes to advanced text searching:

  • They both build a "text index" for a given column/table in a stream file of the IFS.
  • They both use SQL user defined functions to invoke the search functionality
  • They both add triggers to the table/columns being enabled for searching in order to capture changes to the table to maintain their indexes.

Besides being free, OmniFind offers some additional advantages over the DB2 Text Extender product:

  • OmniFind is the current direction of IBM and you can expect future enhancements to the product.
  • OmniFind uses SQL stored procedures to build the text indexes; whereas the DB2 Text Extender product uses it's own command shell that runs in PASE.
  • OmniFind doesn't add additional columns to the table being searched; whereas the DB2 Text Extender product does.