Summary: | Format of PICT records seems different to other metafile blips | ||
---|---|---|---|
Product: | POI | Reporter: | Trejkaz (pen name) <trejkaz> |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.0-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows Vista | ||
Attachments: |
Hex dump of PICT blip
A version of EscherMetafileBlip which correctly processes primary blip UID |
Description
Trejkaz (pen name)
2008-04-27 18:38:01 UTC
Okay here's some more analysis. It isn't a raw PICT, but it isn't the same as the other blip either. However, it's remarkably similar unless I have this all wrong. After the header, we have... 57 32 7B 91 23 5D DB 36 7A DB FF 17 FE F3 A7 05 C7 15 69 2D E5 89 A3 6F 66 03 D6 24 F7 DB 1D 13 (32 bytes unknown) 72 A1 00 00 <-- cb (uncompressed size) 00 00 00 00 00 00 00 00 A3 00 00 00 40 00 00 00 <-- rcBounds 25 ED 1F 00 6A B1 0C 00 <-- ptSize 23 04 00 00 <-- cbSave (compressed size) 00 <-- fCompression FE <-- fFilter cbSave using this scheme does exactly match the remaining data in the blip. So I take it this is the same as EMF/WMF but with 32 bytes of UID instead of 16? Someone emailed me from the POI project saying they're looking into it. (In reply to comment #2) > Someone emailed me from the POI project saying they're looking into it. > I'm having trouble finding documentation for the Escher file stream. This is the best I have found -from this page: http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx this file: http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/OfficeDrawing97-2007BinaryFormatSpecification.pdf My understanding is that it is completely OK for POI contributors to use these documents. Does anyone know of a better resource describing the Escher file format? Perhaps we could update the POI source with a reference/URL to that document. It looks like this particular record (recordId == RECORD_ID_PICT) is described on page 16 of the above document and from what I can tell, the unknown binary data might be in zlib/deflate format. Hope this helps. If we can trust the comments in that document, then: 1. EMF, WMF and PICT are the same afterall. 2. Any of these may have a second UID after the first. 3. The means of determining this is blip_instance ^ blip_signature == 1, where both of these values appear to be nontrivial to compute (to me anyway.) Hi Guys, Yegor has worked through these formats for me and he can tell you what is up. If I recall the PICT format may require that you download Quicktime for Java from Apple. Also, Yegor had success with either WMF or EMF, but not the other. Also, AFAIK the OSP should cover the use of the format spec, but that won't help with PICT. That is Apple's Quickdraw format as grown from what 24 years ago. Apple has always published the format. It wouldn't be too hard to format. I do have gnerative code in FORTRAN if there becomes a desire to generate. Regards, Dave In terms of getting the actual PICT data into a renderable image, that's a separate problem IMO, and one which lies outside of POI. For this bug record, the problem is determining when to read the extra 16 bytes of UID. If someone can figure that out, then we'll have a way to get out the byte[] data, and some other library can read the PICT data, just like some other library reads the WMF and EMF. (In reply to comment #4) > If we can trust the comments in that document, then: > 3. The means of determining this is > blip_instance ^ blip_signature == 1, where both of these values appear to > be nontrivial to compute (to me anyway.) I guess you got this from page 17. The full text is: "The primary UID is only saved to disk if (blip_instance ^ blip_signature == 1). Blip_instance is MSOFBH.inst and blip_signature is one of the values defined in MSOBI" MSFOBH seems to be the common record header from page 8. I believe the POI class EscherRecordHeader corresponds to this: MSOFBH.ver,inst <=> EscherRecordHeader.options MSOFBH.fbt <=> EscherRecordHeader.recordId MSOFBH.cbLength <=> EscherRecordHeader.remainingBytes So the inst field probably corresponds to EscherRecord.getInstance() MSOBI enum is mentioned on page 15. It's not clear to me how to calculate blip_signature. The exclusive or operator giving a result of 1 is also a bit weird here. Note that none of the constants from MSOBI have the LSB set. So perhaps the test for writing the extra UID is whether the LSB of EscherRecord.getInstance() is set. Perhaps the expression was written as such to emphasize that this rule only works when blip_signature == EscherRecord.getInstance() & 0x0FFE. This is all speculation on my part. You might be best to verify the behaviour empirically. Two existing POI junits hit the method EscherMetafileBlip.fillFields() 4 times: TestHSSFPictureData.testPictures() line: 45 "SimpleWithImages.xls" TestOLE2Embeding.testEmbeding() line: 36 "ole2-embedding.xls" - so perhaps with these files, and your current examples you can decipher Microsoft's cryptic description of the m_rgbUidPrimary field. // 3. The means of determining this is // blip_instance ^ blip_signature == 1, where both of these values appear to be // nontrivial to compute (to me anyway.) I figured out how to do this check. See what we have: Metafile signatures are defined in the spec as follows: typedef enum { msobiWMF = 0x216, // Metafile header then compressed WMF msobiEMF = 0x3D4, // Metafile header then compressed EMF msobiPICT = 0x542, // Metafile header then compressed PICT } MSOBI; In your test data EscherMetafileBlip.Options=0x5430 According to the spec: 0x543 ^ 0x542 == 1; //bingo! need to read extra 16 bytes I attached my version of EscherMetafileBlip. Please exercise it against your test data and confirm it works OK. If it does, I will commit the fix. Note, I reverted your previous fix. EscherMetafileBlip.field_2_cb always defines the correct metafile size. Also, it would be good to have test data where blip_instance ^ blip_signature != 1. Please attach a sample if you find one. Regards, Yegor Created attachment 21867 [details]
A version of EscherMetafileBlip which correctly processes primary blip UID
>
> Also, AFAIK the OSP should cover the use of the format spec, but that won't
> help with PICT. That is Apple's Quickdraw format as grown from what 24 years
> ago. Apple has always published the format. It wouldn't be too hard to format.
> I do have gnerative code in FORTRAN if there becomes a desire to generate.
>
I don't think we will encounter legal issues with it.
We don't create or interpret metafiles. We only extract metafiles from existing documents or insert them into xls or ppt.
That version of EscherMetafileBlip fixes the problem for me. Also I stepped through all our test files looking for a blip where the result was 0x00, but I couldn't find one. Somewhat related to this, is it possible that suggestFileExtension() using the format mask directly is also slightly incorrect? (In reply to comment #12) > Somewhat related to this, is it possible that suggestFileExtension() using the > format mask directly is also slightly incorrect? > Good catch. The correct version should use blip.recordId(): public String suggestFileExtension() { switch (blip.getRecordId()) { case EscherMetafileBlip.RECORD_ID_WMF: return "wmf"; case EscherMetafileBlip.RECORD_ID_EMF: return "emf"; case EscherMetafileBlip.RECORD_ID_PICT: return "pict"; case EscherBitmapBlip.RECORD_ID_PNG: return "png"; case EscherBitmapBlip.RECORD_ID_JPEG: return "jpeg"; case EscherBitmapBlip.RECORD_ID_DIB: return "dib"; default: return ""; } } Yegor Thanks for the patch. I committed my version and a unit test. Yegor |