Issue 126230 - Current type detection is too restrict
Summary: Current type detection is too restrict
Status: UNCONFIRMED
Alias: None
Product: Math
Classification: Application
Component: code (show other issues)
Version: 4.2.0-dev
Hardware: All All
: P5 (lowest) Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks: 53509
  Show dependency tree
 
Reported: 2015-04-08 18:44 UTC by Regina Henschel
Modified: 2015-04-22 19:57 UTC (History)
1 user (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
change detection from <?xml to <math (8.50 KB, patch)
2015-04-14 09:05 UTC, Regina Henschel
rb.henschel: review-
Details | Diff
A collection with examples. (8.91 KB, application/x-zip-compressed)
2015-04-18 17:28 UTC, Regina Henschel
no flags Details
improving detection of MathML (10.47 KB, patch)
2015-04-18 18:02 UTC, Regina Henschel
rb.henschel: review?
Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description Regina Henschel 2015-04-08 18:44:22 UTC
The current type detection refuses a MathML file, if there exists no <?xml declaration. But such declaration is not needed for valid MathML.

The SAX parser itself can parse MathML files without such declaration. The parser needs only a math root element with the MATHML namespace URL.
Comment 1 Regina Henschel 2015-04-14 09:05:36 UTC
Created attachment 84655 [details]
change detection from <?xml to <math

The part for detecting MathML has been changed totally, so please have a look. Please test it on Linux and MacOS, because I have only Windows.

The detections accepts now - both for "Open" and "Tool>Import Formula.." - MathML files, if they have a math-Element with MathML namespace attribute. It accepts both the default utf-8 encoding and utf-16 encoding as well. I have not seen other encodings, so others are currently not considered.

The MathML files need no <?xml prolog to be excepted. Because it does not consider the <?xml prolog, but only looks for the math element, the issue #124636 is still fixed. If the MathML part is embedded in xhtml or other xml, it opens still in module Math, if the MathML part is inside the first 4096 bytes.

The utf-16 encoding is used by Microsoft "Math Input Control", which puts the MathML into the clipboard. [My dream is, that AOO can insert such formula from clipboard.]
Comment 2 Kay 2015-04-16 21:27:18 UTC
(In reply to Regina Henschel from comment #1)
> Created attachment 84655 [details]
> change detection from <?xml to <math
> 
> The part for detecting MathML has been changed totally, so please have a
> look. Please test it on Linux and MacOS, because I have only Windows.
> 
> The detections accepts now - both for "Open" and "Tool>Import Formula.." -
> MathML files, if they have a math-Element with MathML namespace attribute.
> It accepts both the default utf-8 encoding and utf-16 encoding as well. I
> have not seen other encodings, so others are currently not considered.
> 
> The MathML files need no <?xml prolog to be excepted. Because it does not
> consider the <?xml prolog, but only looks for the math element, the issue
> #124636 is still fixed. If the MathML part is embedded in xhtml or other
> xml, it opens still in module Math, if the MathML part is inside the first
> 4096 bytes.
> 
> The utf-16 encoding is used by Microsoft "Math Input Control", which puts
> the MathML into the clipboard. [My dream is, that AOO can insert such
> formula from clipboard.]

Thanks for your changes.

Builds fine for me on Linux-32. But I don't really have test documents.
Can you provide some as attachment?
Comment 3 Regina Henschel 2015-04-16 22:33:11 UTC
Thanks for testing. I have further improved the patch. I will attach a new version of the patch and test documents tomorrow (it's already after midnight here ).
Comment 4 Regina Henschel 2015-04-18 17:26:50 UTC
Comment on attachment 84655 [details]
change detection from <?xml to <math

A better patch will follow.
Comment 5 Regina Henschel 2015-04-18 17:28:36 UTC
Created attachment 84672 [details]
A collection with examples.
Comment 6 Regina Henschel 2015-04-18 18:02:52 UTC
Created attachment 84673 [details]
improving detection of MathML

This is my final version of the patch. It has this goals:

Make it possible to open and import MathML fragments, which have only the math element body. That makes it easier for users to reuse formulas from external libraries.

Make it possible to open and import MathML files with any prefix on the math element. OOo1.15 had used prefix "math", MS Office uses prefix "mml", other prefixes are possible.

Detect MathML files, which are UTF-16 encoded. The MS Math Input Panel puts such files into the clipboard.

The patch tries to open all files in Math, which the parser can parse in Math context. If you find a file, that will be detected as "MathML", but the parser is not able to open, inform me and attach the file. Such case would result in an "General input/output error".
Comment 7 Kay 2015-04-20 22:51:59 UTC
Still on Linux-32.

I applied the improved patch today (I pulled out the other one) and checked out some of test cases -- esp utf-16 vs utf-8. So far, so good! :) These test cases were very helpful and I think it would be a good idea to set up a new area under "test" -- http://svn.apache.org/viewvc/openoffice/trunk/test/ -- for them.

Good job! :)
Comment 8 Regina Henschel 2015-04-22 19:28:35 UTC
Thank you Kay for looking again. I'm pleased, that you like it.

Regarding directory "test", I'm not familiar with that testing and don't know what needs to be done. The test cases are constantly available here in Bugzilla.
Comment 9 SVN Robot 2015-04-22 19:50:24 UTC
"regina" committed SVN revision 1675478 into trunk:
#i126230 current Math type detection is too restrict