33702 – Improve Lucene integration

Bug 33702 - Improve Lucene integration

Summary: Improve Lucene integration

Status:	RESOLVED FIXED

Alias:	None

Product:	Lenya
Classification:	Unclassified
Component:	Lucene Integration (show other bugs)
Version:	2.0
Hardware:	PC Windows XP

Importance:	P2 enhancement
Target Milestone:	2.0.1
Assignee:	Lenya Developers

URL:
Keywords:

Depends on:
Blocks:	26012
	Show dependency tree

Reported:	2005-02-22 23:07 UTC by Gregor J. Rothfuss
Modified:	2010-07-20 09:18 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gregor J. Rothfuss 2005-02-22 23:07:59 UTC

The lucene integration should expose more details about the search, and it
should be easier to add new fields to the index via a GUI.

http://www.getopt.org/luke/ might be helpful

Comment 1 Gregor J. Rothfuss 2005-03-20 21:14:25 UTC

on the same vein, replace our crawler with nutch

Comment 2 Gregor J. Rothfuss 2005-06-02 04:51:30 UTC

solprovider wrote on the mailing list:

On 6/1/05, Gregor J. Rothfuss <gregor@apache.org> wrote:

>> Michael Wechner wrote:
>
>>> > connect to a Search engine API in order to allow incremental indexing
>>> > of content being changed within Lenya.
>>> > Also refer to http://opensearch.a9.com/
>
>> agreed. we can bundle this into:
>> * port search to usecase framework
>> * switch crawler to nutch
>> * update index incrementally

Um, I already made search a usecase.  That was necessary to maintain
visitor language information.

This code is available at:
http://lenya.apache.org/1_2_x/how-to/search.html
The original, and comments about using on Linux are at:
http://solprovider.com/lenya/search

While the crawler is good if you want to index non-Lenya websites, it
is poor design for Lenya.  It feels like someone got lazy and used the
DefaultIndexer for HTML rather than use the ConfigurableIndexer for
true integration with Lenya, or maybe there are historical reasons
such as the ConfigurableIndexer not being available when Lucene was
first integrated.  I expect the website crawler to disappear, rather
than be a priority improvement.

My project is moving into production.  Incremental indexing and
integration with the CMS GUI are priorities.  I want the index to be
updated x minutes after updates are completed, delaying if another
save happens before it starts.  The indexer needs to be started from a
scheduler.  I need to integrate with the scheduler, and from the
number of issues about the scheduler in the mailing lists, may need to
revise it.

Comment 3 Florent ANDRE 2010-07-20 09:18:57 UTC

added as a feature for the lenya3 version : http://wiki.apache.org/lenya/Lenya 3.0