Bug 65739 - string terminator warning in CodePageString
Summary: string terminator warning in CodePageString
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: 5.0.x-dev
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2021-12-10 17:56 UTC by Tim Allison
Modified: 2021-12-10 21:26 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2021-12-10 17:56:26 UTC
In trying to migrate Apache Tika to 5.1.0, I noticed that I'm getting "org.apache.poi.hpsf.CodePageString String terminator (\0) for CodePageString property value occurred before the end of string" on quite a few of our unit test files.

If I'm seeing it this often in our unit test files, is it really a problem that should be logged or is there something we can do in our code so that it is no longer a problem?

One triggering file is this one: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testEXCEL_embeddedPDF_mac.xls
Comment 1 PJ Fanning 2021-12-10 21:15:06 UTC
I added r1895794 - I'm no expert on the H*** part of the POI lib but if this issue doesn't stop us from successfully reading the files, maybe best to quieten the logs.
Comment 2 Tim Allison 2021-12-10 21:26:28 UTC
It feels like the problem is that the part that reads the keys shouldn't be including the \u0000.  I, also, am not that familiar with this part of the code.