spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / Lucene / 1 To page 1To page 2To page 3current page
[previous]

Arcsight Engineer
The Computer Merchant, Ltd
US-DC-Washington

Justtechjobs.com Post A Job | Post A Resume
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs


Lucene in Action: Meet Lucene Pt. 1

1.4.2 Searching an index

Searching in Lucene is as fast and simple as indexing; the power of this functionality is astonishing, as chapters 3 and 5 will show you. For now, let's look at Searcher, a command-line program that we'll use to search the index created by Indexer. (Keep in mind that our Searcher serves the purpose of demonstrating the use of Lucene's search API. Your search application could also take a form of a web or desktop application with a GUI, an EJB, and so on.)

In the previous section, we indexed a directory of text files. The index, in this example, resides in a directory of its own on the file system. We instructed Indexer to create a Lucene index in a build/index directory, relative to the directory from which we invoked Indexer. As you saw in listing 1.1, this index contains the indexed files and their absolute paths. Now we need to use Lucene to search that index in order to find files that contain a specific piece of text. For instance, we may want to find all files that contain the keyword java or lucene, or we may want to find files that include the phrase "system requirements."

Using Searcher to implement a search

The Searcher program complements Indexer and provides command-line searching capability. Listing 1.2 shows Searcher in its entirety. It takes two command-line arguments:

Listing 1.2 Searcher: searches a Lucene index for a query passed as an argument

Searcher, like its Indexer sibling, has only a few lines of code dealing with Lucene. A couple of special things occur in the search method,

(1) We use Lucene’s IndexSearcher and FSDirectory classes to open our index for searching.

(2) We use QueryParser to parse a human-readable query into Lucene’s Query class.

(3) Searching returns hits in the form of a Hits object.

(4) Note that the Hits object contains only references to the underlying documents. In other words, instead of being loaded immediately upon search, matches are loaded from the index in a lazy fashion—only when requested with the hits. doc(int) call.

Running Searcher

Let’s run Searcher and find some documents in our index using the query 'lucene':

The output shows that 6 of the 13 documents we indexed with Indexer contain the word lucene and that the search took a meager 66 milliseconds. Because Indexer stores files’ absolute paths in the index, Searcher can print them out. It’s worth noting that storing the file path as a field was our decision and appropriate in this case, but from Lucene’s perspective it’s arbitrary meta-data attached to indexed documents.

Of course, you can use more sophisticated queries, such as 'lucene AND doug' or 'lucene AND NOT slow' or '+lucene +book', and so on. Chapters 3, 5, and 6 cover all different aspects of searching, including Lucene’s query syntax.

Using the xargs utility

The Searcher class is a simplistic demo of Lucene’s search features. As such, it only dumps matches to the standard output. However, Searcher has one more trick up its sleeve. Imagine that you need to find files that contain a certain keyword or phrase, and then you want to process the matching files in some way. To keep things simple, let’s imagine that you want to list each matching file using the ls UNIX command, perhaps to see the file size, permission bits, or owner. By having matching document paths written unadorned to the standard output, and having the statistical output written to standard error, you can use the nifty UNIX xargs utility to process the matched files, as shown here:

In this example, we chose the Boolean query 'lucene AND NOT slow', which finds all files that contain the word lucene and don’t contain the word slow. This query took 131 milliseconds and found 6 matching files. We piped Searcher’s output to the xargs command, which in turn used the ls –l command to list each matching file. In a similar fashion, the matched files could be copied, concatenated, emailed, or dumped to standard output.3

Our example indexing and searching applications demonstrate Lucene in a lot of its glory. Its API usage is simple and unobtrusive. The bulk of the code (and this applies to all applications interacting with Lucene) is plumbing relating to the business purpose—in this case, Indexer’s file system crawler that looks for text files and Searcher’s code that prints matched filenames based on a query to the standard output. But don’t let this fact, or the conciseness of the examples, tempt you into complacence: There is a lot going on under the covers of Lucene, and we’ve used quite a few best practices that come from experience. To effectively leverage Lucene, it’s important to understand more about how it works and how to extend it when the need arises. The remainder of this book is dedicated to giving you these missing pieces.

Written by Otis Gospodnetic and Erik Hatcher and reproduced from "Lucene in Action" by permission of Manning Publications Co. ISBN 1932394281, copyright 2004. All rights reserved. See http://www.manning.com for more information.

home / programming / Lucene / 1 To page 1To page 2To page 3current page
[previous]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: March 27, 2003
Revised: January 24, 2005

URL: http://webreference.com/programming/lucene/1