Bug 60975 - Error converting doc with excel correspondence to html
Summary: Error converting doc with excel correspondence to html
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.16-dev
Hardware: PC All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-12 11:22 UTC by ricardo.martin.aguirre.sanchez
Modified: 2017-06-16 20:24 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ricardo.martin.aguirre.sanchez 2017-04-12 11:22:40 UTC
Hi,
In this case I am trying to convert a .doc document into an html,
The particular thing is that the document word is product of making a correspondence with data in a table of excel, that is to say, from word I use the option of "correspondence" which allows me to bring values ​​of some excel table, when this happens, in the Word words are brought to perfection, but word internally adds them MERGEFIELD {FIELD} VALUE.
The problem is that if to these words or sentences that I have in the word I add an ENTER, when I try to convert to an html by means of wordToHtmlConverter.processDocument (doc), this duplicates the words that are after ENTER.
Example:
In the .doc document:
Phrase brought from
Excel

After the processDocument method:
Phrase brought from excel
Excel

processDocument->AbstractWordConverter->org.apache.poi.hwpf.converter->poi-scratchpad-3.8-beta4.jar

As a test to rule out that it is a problem that was solved with the future versions, what I did was to update one by one each version until the last 3.16, but the bug persists.

My code:

FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); 
            HWPFDocument doc=new HWPFDocument(finStream);
            WordExtractor wordExtract=new WordExtractor(doc);
            Document newDocument = DocumentBuilderFactory.newInstance() .newDocumentBuilder().newDocument();
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument) ;
            wordToHtmlConverter.processDocument(doc);

            StringWriter stringWriter = new StringWriter();
            Transformer transformer = TransformerFactory.newInstance().newTransformer();

            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "html");
            transformer.transform(new DOMSource( wordToHtmlConverter.getDocument()), new StreamResult( stringWriter ) );

            String html = stringWriter.toString();

Thanks.