Saturday, December 24, 2011

How do I do a Fuzzy Address Search with PL / SQL (for Oracle)?

I would like to search the vendor site table for address but it needs to be a fuzzy lookup. It has to find these to be match:


123 FAKE St.


123.Fake St


123 fake St


123 Fake Street


1 2 3 Fake Street


123 N Fake Street


123 Fake Ave


123 Fake Drive


123 North Fake Circle


Even possible some misspellings like


123 Fke St would be nice as well.|||oracle text can have any of a number of indexing rules. I'd suggest you look into OT definition classifications (context, ctxcat, ctxrule) rather than rely on straight SQL solutions. you can build some seriously powerful solutions if you know how to index your text correctly|||What you want to do is provided by the Oracle Spatial Geocoder, a feature of Oracle Spatial. It handles international addresses and will return detailed indications on the quality of your input as well as correct any mistakes (spelling, street types, house numbers, postal codes, ...).


To use it (besides getting the proper product licenses from Oracle) you will need a street address data set, available out of the box from Navteq.





See http://www.oracle.com/technology/product鈥?/a> for details. The "Partner Data" link on the right takes you to a page where you can download a full dataset for San Francisco, CA for you to test and evaluate.





You can also try http://elocation.oracle.com for an online service that illustrates it. as well as http://elocation.oracle.com/geocoder/ which allows you to enter and address and see what happens. Note however that those services are for demonstration only: you will not be allowed to use them in commercial deployments.








There are also commercial services available that do address cleanup. I don't have the pointers handy just now.|||There are some PL/SQL operators that will get you close, but perhaps not 100% of the way. However, by building a query using AND and OR operators you can get darn close.





For example,





SELECT address


FROM vendor_site


WHERE UPPER(address) LIKE '%FAK%';





will get you everything but the misspelled ("fke") entry.





Look also at the BETWEEN operator.

No comments:

Post a Comment