Bug 55026

Summary: Parse the parameter part in the ContentType definition
Product: POI Reporter: Sebastien Schneider <sebastien.schneider>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement CC: philippe, sebastien.schneider
Priority: P1    
Version: 3.9-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: OPC file with Content_Types.xml containing parameters
Patch for both files ContentType.java and TestContentType.java

Description Sebastien Schneider 2013-05-29 14:20:49 UTC
Hi POI team,

My enhancement is related to ContentType support in the openxml4j part of the POI library.
In the current 3.9 version, ContentType  containing parameters throw a "malformed content type" exception when parsing the OPC document.
Such ContentType could be of the form "application/xml;key1=value1;key2=value2"

There's already code to support this format in the ContentType class but it's commented out !

Is it possible to activate this ContentType format in a future version ?

Thank you,
Sebastien.
Comment 1 Nick Burch 2013-05-29 14:35:28 UTC
Do you have a sample file that has parameters in it? And if so, could you please upload it, ideally along with a short unit test that shows you trying to load + read them?
Comment 2 Sebastien Schneider 2013-05-29 20:43:27 UTC
Created attachment 30341 [details]
OPC file with Content_Types.xml containing parameters

I attach a very simple OPC file with a "Content_Types.xml" which contains parameters:
ContentType="application/x-resqml+xml;version=2.0;type=obj_global2dCrs"

The only line of code I need to highlight the problem is the OPCPackage.open method call like that:
OPCPackage p = OPCPackage.open("opc_contenttype_test_wparams.opc", PackageAccess.READ);

This call throw the following exception:
org.apache.poi.openxml4j.exceptions.InvalidFormatException: The specified content type 'application/x-resqml+xml;version=2.0;type=obj_global1dCrs' is not compliant with RFC 2616: malformed content type.

I think that it's because the code from the /[Apache-SVN]/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/opc/internal/ContentType.java doesn't support such ContentType string format.

Thank you,
cheers,
Sebastien.
Comment 3 Nick Burch 2013-05-29 22:24:54 UTC
In r1487657 I have added your unit test, and stubbed out the unit tests we'll need

The next step is to review the ooxml spec, then write the unit tests for valid parameters based on the stubbed out bits

Finally, we can then try to enable the parameter logic

If you have some time to have on part #2, that'd be great!
Comment 4 Sebastien Schneider 2013-05-30 07:29:07 UTC
Unfortunately I won't have time to work on that now. I hope to be able to help a little bit next month ...


(In reply to Nick Burch from comment #3)
> In r1487657 I have added your unit test, and stubbed out the unit tests
> we'll need
> 
> The next step is to review the ooxml spec, then write the unit tests for
> valid parameters based on the stubbed out bits
> 
> Finally, we can then try to enable the parameter logic
> 
> If you have some time to have on part #2, that'd be great!
Comment 5 Sebastien Schneider 2013-05-30 09:39:02 UTC
I just reviewed the ooxml spec from the document ISO_IEC_29500-2_2012.pdf, the ContentType format is specified in 9.1.2 by referencing the RFC2616, paragraph 3.7. The format of the media-type defined by ContentType is as follows:
media-type = type "/" subtype *( ";" parameter )
where parameter is expressed as
attribute "=" value

Now needs to complete unit test and enable the corresponding code in ContentType.java parsing implementation.

Sebastien.


(In reply to Nick Burch from comment #3)
> In r1487657 I have added your unit test, and stubbed out the unit tests
> we'll need
> 
> The next step is to review the ooxml spec, then write the unit tests for
> valid parameters based on the stubbed out bits
> 
> Finally, we can then try to enable the parameter logic
> 
> If you have some time to have on part #2, that'd be great!
Comment 6 Sebastien Schneider 2013-08-29 16:01:10 UTC
Created attachment 30782 [details]
Patch for both files ContentType.java and TestContentType.java

I propose you the fix for this bug. I complete the unit test with hard coded parameterized content type but I don't implement the file unit test.

I had an issue with the Java pattern matcher that do not handle multiple group when matching automatically, so I had to add a second matcher specialized to process parameters.

It works well on my files.

Thank you in advance for integration and feel free to modify it the proper way.
Comment 7 Nick Burch 2014-02-19 23:36:07 UTC
Thanks for this patch, and sorry it got forgotten

I've done some work on this myself, and then incorporated much of your logic and tests too. As of r1569976 we're now able to process these content types without error, and we have a lot more testing around it all.

Thanks for your help!