MS Word 2003 XP2 (the only version I've tested so far) stores the CodePage value in the custom properties section as an untitled property. When the custom properties are read in by DocumentSummaryInformation.getCustomProperties it ignores untitled properties. Therefore the CodePage value isn't read in and getCodePage() returns -1. This has the side effect of changing the CodePage when the data is written out to a file because it is set to the default value of CP_UNICODE. To repeat the problem: 1. Create a new document in Word with at least one custom property 2. Check the CodePage value for the custom properties section (e.g. using the sample reader code). In my case it was 65001. As long as the value isn't 1200 the problem will occur. 3. Using the sample code that updates a custom property, update a value and write out the data to a file 4. Check the CodePage value for the newly written doc. It will have changed to 1200 (CP_UNICODE). The workaround is as follows: * Firstly read in the document using the POIFSReaderListener interface (as per the sample code) and cache the codePage value. * After re-reading the document for update purposes set the custom properties codePage value to the cached value by building a CustomProperty and using CustomProperties.set("PID_CODEPAGE", CustomProperty cp).
No update for a long time => closing this for now. If this is still a problem for you please verify with a recent version of POI and reopen this bug with more information, ideally a unit-test that can be used to reproduce the problem.