Bug 51604 - replace text corrupts doc file
replace text corrupts doc file
 Status: REOPENED None POI Unclassified HWPF (show other bugs) 3.8-dev PC All P2 normal (vote) --- POI Developers List

 Reported: 2011-08-03 02:43 UTC by sanmoy 2017-01-21 19:07 UTC (History) 0 users

Attachments
test input file for hwpf (26.00 KB, application/msword)
2011-08-04 00:30 UTC, sanmoy
Details
test output file for hwpf (20.00 KB, application/msword)
2011-08-11 05:26 UTC, sanmoy
Details
comparison result (40.00 KB, image/png)
2011-08-11 05:27 UTC, sanmoy
Details
error-snap-shot (22.17 KB, image/png)
2011-08-11 09:25 UTC, sanmoy
Details

 Note You need to log in before you can comment on or make changes to this bug.
 sanmoy 2011-08-03 02:43:00 UTC I had written this simple piece of code FileInputStream fileInputStream = new FileInputStream(new File("C:\\in.doc")); FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\out.doc")); HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream); Range range = hwpfDocument.getRange(); int numParagraph = range.numParagraphs(); for (int i = 0; i < numParagraph; i++) { Paragraph paragraph = range.getParagraph(i); int numCharRuns = paragraph.numCharacterRuns(); for (int j = 0; j < numCharRuns; j++) { CharacterRun charRun = paragraph.getCharacterRun(j); String text = charRun.text(); charRun.replaceText(text, "added"); } } hwpfDocument.write(fileOutputStream); After the execution, the output file becomes corrupted ( tried to open with office 2007) , input file was properly opening. input file is a very basic file, with 4-5 simple lines No exception thrown. had written similar code using XWPFDocument, and it works fine sanmoy 2011-08-03 02:54:44 UTC One addition, fileOutputStream.close() is present in the original code, by mistake, which i didn't copy while raising the defect. Please remember, the following code is working perfectly XWPFDocument xwpfDocument = new XWPFDocument(inputStream); List paragraphs = xwpfDocument.getParagraphs(); for(XWPFParagraph xParagraph:paragraphs) { for(XWPFRun xwpfRun : xParagraph.getRuns()) { xwpfRun.setText("replace", 0); } } xwpfDocument.write(outputStream); outputStream.close(); Yegor Kozlov 2011-08-03 06:58:05 UTC Which version of POI? Please try with the latest build from trunk, there have been quite a lot of updates recently. If the problem is still there, please attach the problematic file. Without the input .doc file we can't do much to help you. Yegor sanmoy 2011-08-04 00:30:42 UTC Created attachment 27346 [details] test input file for hwpf I have tested with 6-Jul-2011 nightly build. poi-bin-3.8-beta3-20110606.tar sanmoy 2011-08-04 00:40:19 UTC Now I have tested with yesterday's nightly build poi-3.8-beta4-20110803 from http://encore.torchbox.com/poi-svn-build/ Jacob 2011-08-05 16:15:49 UTC Very interested in this bug. Having the same issue. Can't use XWPF since contract requires us to maintain original version. Sergey Vladimirov 2011-08-09 05:26:05 UTC Fixed in r1155211 (3.8-beta4-20110810 or later). Please, test. sanmoy 2011-08-11 05:26:24 UTC Created attachment 27369 [details] test output file for hwpf I have tested with 3.8-beta4-20110810, but the defect still exists. Now, I have added a check, when the line contains the string "Header" ( please see the input doc ) then it will replace the text. if(text.contains("Header")) charRun.replaceText(text, "added"); And I have compared the output file with "beyond compare" ( a file comparison tool ), it shows the text has been replaced properly, but the document format is corrupted. I don't know much about the doc format so cannot comment. I have attached the output file, hope it will help. sanmoy 2011-08-11 05:27:38 UTC Created attachment 27370 [details] comparison result comparison result Sergey Vladimirov 2011-08-11 08:23:29 UTC Sanmoy, I don't see the difference between in.doc and out.doc, except "Header" -> "added" change. What exaclty is broken? Sergey sanmoy 2011-08-11 09:25:46 UTC Created attachment 27371 [details] error-snap-shot please try to open the output file with MS office 2003 or 2007, it will display file corrupted Sergey Vladimirov 2011-08-11 18:43:51 UTC Sanmoy, I did fix a couple of issues (FIB and stylesheets processing) that may be reason of why file is not opening by Microsoft Office. Please try with next night build or trunk version. Result file is still not passed binary file validation tool thought :( Sergey Sergey Vladimirov 2011-08-16 10:20:13 UTC Sanmoy, Please, check the latest trunk version or 3.8-beta4 or later. sanmoy 2011-08-20 10:50:33 UTC the defect has been fixed .. thanks sanmoy 2011-09-04 16:49:18 UTC /** * Replace (all instances of) a piece of text with another... * * @param pPlaceHolder * The text to be replaced (e.g., "\${organization}") * @param pValue * The replacement text (e.g., "Apache Software Foundation") */ public void replaceText(String pPlaceHolder, String pValue) The replaceText API will not work if the String pValue contains the String pPlaceHolder For example if pPlaceHolder="abcd" and pValue="abcd" or "abcdef" or "12abcdef" this code will go to a infinite loop Modify the original testcode charRun.replaceText(text, text); that is, try to replace the original value with itself, it will not work, it will fall into a infinite loop. For your convenience, I am copying the original code again. Please test it with the attached files FileInputStream fileInputStream = new FileInputStream(new File("C:\\in.doc")); FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\out.doc")); HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream); Range range = hwpfDocument.getRange(); int numParagraph = range.numParagraphs(); for (int i = 0; i < numParagraph; i++) { Paragraph paragraph = range.getParagraph(i); int numCharRuns = paragraph.numCharacterRuns(); for (int j = 0; j < numCharRuns; j++) { CharacterRun charRun = paragraph.getCharacterRun(j); String text = charRun.text(); charRun.replaceText(text, text); } } hwpfDocument.write(fileOutputStream); fileOutputStream.close(); I have tested with the latest nightly buid 3.8-beta5-20110904 I have debugged the poi code and found the problem in this following logic String text = text(); int offset = text.indexOf(pPlaceHolder); text is returning the replaced value and if the replaced value contains the original String, offset will always be >=0 and it will keep on increasing public void replaceText(String pPlaceHolder, String pValue) { boolean keepLooking = true; while (keepLooking){ String text = text(); int offset = text.indexOf(pPlaceHolder); if (offset >= 0) replaceText(pPlaceHolder, pValue, offset); else keepLooking = false; } } applanc 2012-02-14 17:13:38 UTC Hi, I have tested poi-bin-3.8-beta5-20111217.tar.gz with a MS office 2003 (input file) using "charRun.replaceText("XYZ", "ABC");" and the output file is corrupted. without replacement the output is ok, so I guess that problem come from replacement logic. Also it worked out with poi-bin-3.8-beta4 if length of pValue was equal to the length of PlaceHolder.. otherwise the output file was also corrupted ! Any idea ! Thanks