Bug 41700 - Website import feature
Summary: Website import feature
Status: NEW
Alias: None
Product: Lenya
Classification: Unclassified
Component: Miscellaneous (show other bugs)
Version: unspecified
Hardware: Other other
: P2 enhancement
Target Milestone: 2.0.1
Assignee: Lenya Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-26 01:23 UTC by Andreas Hartmann
Modified: 2007-07-16 03:02 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Hartmann 2007-02-26 01:23:08 UTC
It would be nice to have a feature to import existing websites:

- ask for an XPath to extract the body content
- ask for an XPath to extract the nav title (/html/head/title, //h1, ...)
- crawl
- extract page content using JTidy
- use the URL space or custom mechanism to build the site tree