Bug 33702 - Improve Lucene integration
Summary: Improve Lucene integration
Alias: None
Product: Lenya
Classification: Unclassified
Component: Lucene Integration (show other bugs)
Version: 2.0
Hardware: PC Windows XP
: P2 enhancement
Target Milestone: 2.0.1
Assignee: Lenya Developers
Depends on:
Blocks: 26012
  Show dependency tree
Reported: 2005-02-22 23:07 UTC by Gregor J. Rothfuss
Modified: 2010-07-20 09:18 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Gregor J. Rothfuss 2005-02-22 23:07:59 UTC
The lucene integration should expose more details about the search, and it
should be easier to add new fields to the index via a GUI.

http://www.getopt.org/luke/ might be helpful
Comment 1 Gregor J. Rothfuss 2005-03-20 21:14:25 UTC
on the same vein, replace our crawler with nutch
Comment 2 Gregor J. Rothfuss 2005-06-02 04:51:30 UTC
solprovider wrote on the mailing list:

On 6/1/05, Gregor J. Rothfuss <gregor@apache.org> wrote:

>> Michael Wechner wrote:
>>> > connect to a Search engine API in order to allow incremental indexing
>>> > of content being changed within Lenya.
>>> > Also refer to http://opensearch.a9.com/
>> agreed. we can bundle this into:
>> * port search to usecase framework
>> * switch crawler to nutch
>> * update index incrementally

Um, I already made search a usecase.  That was necessary to maintain
visitor language information.

This code is available at:
The original, and comments about using on Linux are at:

While the crawler is good if you want to index non-Lenya websites, it
is poor design for Lenya.  It feels like someone got lazy and used the
DefaultIndexer for HTML rather than use the ConfigurableIndexer for
true integration with Lenya, or maybe there are historical reasons
such as the ConfigurableIndexer not being available when Lucene was
first integrated.  I expect the website crawler to disappear, rather
than be a priority improvement.

My project is moving into production.  Incremental indexing and
integration with the CMS GUI are priorities.  I want the index to be
updated x minutes after updates are completed, delaying if another
save happens before it starts.  The indexer needs to be started from a
scheduler.  I need to integrate with the scheduler, and from the
number of issues about the scheduler in the mailing lists, may need to
revise it.
Comment 3 Florent ANDRE 2010-07-20 09:18:57 UTC
added as a feature for the lenya3 version : http://wiki.apache.org/lenya/Lenya 3.0