Summary: | Broken paragraph to text mapping in some documents | ||
---|---|---|---|
Product: | POI | Reporter: | Maxim Valyanskiy <max.valjanski> |
Component: | HWPF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.6-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: | document |
Description
Maxim Valyanskiy
2009-10-28 07:00:02 UTC
Created attachment 24433 [details]
document
Paragraph offsets (FC) in PAPX in this file are 2048 bytes larger than real character data in text pieces. Hm. This file seems so very wrong to me. OpenOffice or LibreOffice can't even show it correctly. More detailed, it have 2 TextPieces: TextPiece from 0 to 1199 (PieceDescriptor (pos: 2048; unicode)) TextPiece from 1199 to 2377 (PieceDescriptor (pos: 4608; unicode)) but all CHPX are reffers to second text piece: * CHPX from 1024 to 1037 (in bytes 4096 to 4122) * CHPX from 1037 to 1038 (in bytes 4122 to 4124) * ... * CHPX from 2142 to 2377 (in bytes 6494 to 11776) as well as PAPX: * PAPX from 1185 to 1199 (in bytes 4418 to 4478) * PAPX from 2142 to 2377 (in bytes 6494 to 12102) so it just bad file, AFAIK. Apart from that, there is a table without single row or cell. I.e. there is a PAPX with inTable=true, but no end cells marks. Sergey, can it be "autosaved" file? I seen some strange format violations in such files Maxim, No, it doesn't look like quick-saved: [FIB] ... .fComplex = false ... [/FIB] Although it was quick-saved 15 times, currently it states as fully-saved file. Also there is no additional grpprl(s) in CPL section, i.e. there is no SPRM(s) quicksave data. |