Building indexing service for database using Lucene
|
|
|
|
|
.Net Framework Version: 1.1, 2.0
Success of every web application depends a lot on how accurate and how fast
your user can search the content or product from vast range of options your
company provides. And if a user can't quickly find what they are looking for,
they get frustrated and go to other sites where they can do what they came to
do on your site or purchase from your site. We all are very familiar with our
good old apporach of writing a complex SQL query to perform a search against
the database and then hoping for the best. What this means is that you need to
decide at schema design time about the fields of you table on which you want to
put the index to optimize the search at rerieval time. And then you need to
make decision if you want to have ful text indexing turned on for your database
server or not. If you are a big company with your own hosting and database
server, then controlling some decisions becomes easy. But if you are hosting
your site on a shared server and are using a shared database server, then you
are at the mercy of the hosting provider to turn on full text indexing or space
allocation for the database etc. And then you have a bigger problem to deal
with on shaed hosting and that is connection pool. We all know that new content
or products are not added to a site every minute or so. Its the content search
that is performed by thousands and millions of users on your site. What this
means is that you need to make sure that your database server has sufficient
connection pool to server the application. And on shared hosting you can not
trust other applications to behave properly and close unused connection. And we
had a very real life experience on one of our shard hosted site. Every 10-15
minutes our application will throw exceotios about running out of connections
avalable in pool. And after some investigation we found that there was one
application on the web server that was not closing the open connections with
database and ended up starving all other applications.
Thats when we thought of a solution that will not depend on availability of
connections to database and perform search on external index. And we developed
a library that uses Lucene
to perform the indexing and search. The .Net version of Lucene is also
available from apache site. We made some modifications to it fix some bugs and
improved its memory management. Our prododuct
Lucene4DB does not take any credit for the wonderful indexing and
search provided by Lucene library. We have made a modest attempt to provide a
wrapper on top of it to facilitate writing SQL queries, search queries and
using the underlying library in an intutive and easy fashion.
We have developed a sample application that shows how this library can be used
to build a multithreaded indexing service to index multiple tables
simultaneously. The example uses PUBS database to index employee and
titles tables.
Here are some of the advantages that we expereinced with the approach of using
an external index that using the database index.
-
We did not have to rely on avaialbility of database connections or depend on
connection pool configuration
-
We were able to create multiple indexes on same table but with different fields
and settings
-
We were able to reset the index at run time
-
We were able to reconfigure the index fields at run time
-
We were able to add new records in the index at run time with need for a
reindex
-
We were able to delete records from index at run time
-
Most importantly the search results were accurate and search was balzingly
fast.
|