Bug 60975 - Error converting doc with excel correspondence to html
Summary: Error converting doc with excel correspondence to html
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.16-dev
Hardware: PC All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2017-04-12 11:22 UTC by ricardo.martin.aguirre.sanchez
Modified: 2017-06-16 20:24 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description ricardo.martin.aguirre.sanchez 2017-04-12 11:22:40 UTC
In this case I am trying to convert a .doc document into an html,
The particular thing is that the document word is product of making a correspondence with data in a table of excel, that is to say, from word I use the option of "correspondence" which allows me to bring values ​​of some excel table, when this happens, in the Word words are brought to perfection, but word internally adds them MERGEFIELD {FIELD} VALUE.
The problem is that if to these words or sentences that I have in the word I add an ENTER, when I try to convert to an html by means of wordToHtmlConverter.processDocument (doc), this duplicates the words that are after ENTER.
In the .doc document:
Phrase brought from

After the processDocument method:
Phrase brought from excel


As a test to rule out that it is a problem that was solved with the future versions, what I did was to update one by one each version until the last 3.16, but the bug persists.

My code:

FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); 
            HWPFDocument doc=new HWPFDocument(finStream);
            WordExtractor wordExtract=new WordExtractor(doc);
            Document newDocument = DocumentBuilderFactory.newInstance() .newDocumentBuilder().newDocument();
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument) ;

            StringWriter stringWriter = new StringWriter();
            Transformer transformer = TransformerFactory.newInstance().newTransformer();

            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "html");
            transformer.transform(new DOMSource( wordToHtmlConverter.getDocument()), new StreamResult( stringWriter ) );

            String html = stringWriter.toString();