The lucene integration should expose more details about the search, and it should be easier to add new fields to the index via a GUI. http://www.getopt.org/luke/ might be helpful
on the same vein, replace our crawler with nutch
solprovider wrote on the mailing list: On 6/1/05, Gregor J. Rothfuss <gregor@apache.org> wrote: >> Michael Wechner wrote: > >>> > connect to a Search engine API in order to allow incremental indexing >>> > of content being changed within Lenya. >>> > Also refer to http://opensearch.a9.com/ > >> agreed. we can bundle this into: >> * port search to usecase framework >> * switch crawler to nutch >> * update index incrementally Um, I already made search a usecase. That was necessary to maintain visitor language information. This code is available at: http://lenya.apache.org/1_2_x/how-to/search.html The original, and comments about using on Linux are at: http://solprovider.com/lenya/search While the crawler is good if you want to index non-Lenya websites, it is poor design for Lenya. It feels like someone got lazy and used the DefaultIndexer for HTML rather than use the ConfigurableIndexer for true integration with Lenya, or maybe there are historical reasons such as the ConfigurableIndexer not being available when Lucene was first integrated. I expect the website crawler to disappear, rather than be a priority improvement. My project is moving into production. Incremental indexing and integration with the CMS GUI are priorities. I want the index to be updated x minutes after updates are completed, delaying if another save happens before it starts. The indexer needs to be started from a scheduler. I need to integrate with the scheduler, and from the number of issues about the scheduler in the mailing lists, may need to revise it.
added as a feature for the lenya3 version : http://wiki.apache.org/lenya/Lenya 3.0