Bug 47709 - Issue with parsing the 'font' shorthand
Summary: Issue with parsing the 'font' shorthand
Status: NEW
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: fo tree (show other bugs)
Version: trunk
Hardware: All All
: P3 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
: 47859 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-08-19 11:23 UTC by Andreas L. Delmelle
Modified: 2012-04-18 07:12 UTC (History)
2 users (show)



Attachments
amendment to font-shorthand-test which makes the build fail (1.05 KB, application/octet-stream)
2009-08-19 11:23 UTC, Andreas L. Delmelle
Details
Allow case when no font-family is quoted and no comma-separated list of font families (1.31 KB, patch)
2009-09-16 10:52 UTC, Jonathan Levinson
Details | Diff
Get font short hand completely working (1.67 KB, patch)
2009-09-21 13:29 UTC, Jonathan Levinson
Details | Diff
Fix parsing of font-size and line height (6.30 KB, patch)
2009-10-05 11:16 UTC, Jonathan Levinson
Details | Diff
Add more tests to font-shorthand-test (5.54 KB, patch)
2009-10-05 11:18 UTC, Jonathan Levinson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas L. Delmelle 2009-08-19 11:23:08 UTC
Created attachment 24148 [details]
amendment to font-shorthand-test which makes the build fail

Parsing of a very simple value for the font shorthand property currently fails
in FOP Trunk.

In attach, a patch/amendment to the related FO Tree testcase which, if applied,
shows that it does not pass...
Comment 1 Jonathan Levinson 2009-09-16 10:52:07 UTC
Created attachment 24277 [details]
Allow case when no font-family is quoted and no comma-separated list of font families

Allow case when no font-family is quoted and no comma-separated list of font families.

The code in FontShorthandProperty is looking for the beginning of the font-family.  If the font-family contains commas then the code works as before.  If the font-family begins with a quote the code works as before.  There was one case missing - when the code is looking for the beginning of the font family but there are no commas and no quotes.

The font-family can be quite complex containing commas, single quoted strings, double quote strings, and strings without quotes but the parsing of this complex expression is taken care of by FontFamilyProperty.  All FontShorthandProperty does is look for where the font-family begins.  It almost did this correctly (before this patch) when the font-family contained commas or consisted of a quoted string.  Now there is the case when the font-family contains no quotes and no commas.

However, there is a fly in the ointment.  In testing this change (and in testing the previous unchanged code), I added the following test case, which fails in both cases.

     <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
        <test:assert property="font-family" expected="[Arial, Times New Roman, sans-serif]"/>
        <test:assert property="font-size" expected="20736mpt"/>
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="NORMAL" />
        <test:assert property="line-height.optimum" expected="29030mpt" />
        <test:assert property="font-variant" expected="NORMAL" />
        Test font shorthand
      </fo:block> 

There seems to be a problem when the quote follows a comma.  I will add a patch to font-shorthand-test.fo which demonstrates this problem, and will continue to diagnose the code to see what further problems in the code are causing this problem.
Comment 2 Vincent Hennebert 2009-09-21 04:13:27 UTC
*** Bug 47859 has been marked as a duplicate of this bug. ***
Comment 3 Vincent Hennebert 2009-09-21 04:15:56 UTC
(In reply to comment #2)
> *** Bug 47859 has been marked as a duplicate of this bug. ***

See augmented patch to font-shorthand-test in bug report above.
Comment 4 Vincent Hennebert 2009-09-21 04:38:54 UTC
Hi Jonathan,

Thank you for your contribution to fixing this issue.

I think the whole parsing of this property needs to be refactored, because it's unnecessarily complicated and doesn't work. I don't really get the purpose of the "while (!fontFamilyParsed)" loop but it seems to me that it's doing a job that should be left over to the FontFamilyProperty parser. Also, it doesn't seem to handle the case where there are more than one comma (i.e., at least 3 families specified). If more than one font-style/-variant/-weight is specified, it will try to parse the second value (say, font-variant) as a font family. Also, it tries to apply a special treatment when system fonts (caption, icon, menu...) are specified, whereas it is explicitly stated in the Recommendation that they don't apply to XSL-FO.

I think the best way to handle that is to write a parser that follows the grammar given by the Recommendation. Hopefully a simple LL parser will do [1], but given that all of the sub-properties accept the 'inherit' keyword it will probably not be LL(1). If an LL parser can't be used, then we will have to use an LR one [2], which is slightly more problematic since it cannot be easily coded by hand, and we will have to rely on an external parsing library.

I suggest you to look if an LL parser can be written to properly parse this property, and report back here if it's not the case, and if we must consider other solutions.

[1] http://en.wikipedia.org/wiki/LL_parser
[2] http://en.wikipedia.org/wiki/LR_parser

Thanks,
Vincent


(In reply to comment #1)
> Created an attachment (id=24277) [details]
> Allow case when no font-family is quoted and no comma-separated list of font
> families
> 
> Allow case when no font-family is quoted and no comma-separated list of font
> families.
> 
> The code in FontShorthandProperty is looking for the beginning of the
> font-family.  If the font-family contains commas then the code works as before.
>  If the font-family begins with a quote the code works as before.  There was
> one case missing - when the code is looking for the beginning of the font
> family but there are no commas and no quotes.
> 
> The font-family can be quite complex containing commas, single quoted strings,
> double quote strings, and strings without quotes but the parsing of this
> complex expression is taken care of by FontFamilyProperty.  All
> FontShorthandProperty does is look for where the font-family begins.  It almost
> did this correctly (before this patch) when the font-family contained commas or
> consisted of a quoted string.  Now there is the case when the font-family
> contains no quotes and no commas.
> 
> However, there is a fly in the ointment.  In testing this change (and in
> testing the previous unchanged code), I added the following test case, which
> fails in both cases.
> 
>      <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
>         <test:assert property="font-family" expected="[Arial, Times New Roman,
> sans-serif]"/>
>         <test:assert property="font-size" expected="20736mpt"/>
>         <test:assert property="font-weight" expected="400" />
>         <test:assert property="font-style" expected="NORMAL" />
>         <test:assert property="line-height.optimum" expected="29030mpt" />
>         <test:assert property="font-variant" expected="NORMAL" />
>         Test font shorthand
>       </fo:block> 
> 
> There seems to be a problem when the quote follows a comma.  I will add a patch
> to font-shorthand-test.fo which demonstrates this problem, and will continue to
> diagnose the code to see what further problems in the code are causing this
> problem.
Comment 5 Jonathan Levinson 2009-09-21 10:19:15 UTC
Hi Vincent,

It was my pleasure to help out.  Thank you for your review of the problems you think still remain in the font shorthand code.  

When you closed 47859, did you noticed I had attached a patch that fixes one of the issues you mention below:

"Also, it doesn't
seem to handle the case where there are more than one comma (i.e., at least 3
families specified)."

This patch, which (I believe) fixes the problem you mention, was attached to 47859, which you closed as a duplicate. 

I infer from reading and studying the code that the purpose of the complicated ""while (!fontFamilyParsed)" loop is to identify the beginning of the font family token.  The FontFamilyProperty parser assumes it has been given a string which represents a font family and the font shorthand routine is trying to identify the font family token (a string) which is sent to FontFamilyProperty.  The font shorthand code is not trying to do any parsing.  I say this with all due respect for your greater experience with Apache FOP.  

You've made other comments I'll have to investigate before replying further.  I need to add more test cases to verify that my fix (the one in 47859) corrects all issues with font shorthand property.

I can see your point about the desirability of a font shorthand parser.  Currently the font short hand code tries to identify the strings which should be sent to each of the font property parsers such as FontFamilyProperty.  So the font shorthand code is a tokenizer rather than a parser.  Its goal is to identify the tokens and then send the tokens to property parsers for further processing.  It could be a tokenizer is all this really needed, but I don't know at this point.

Best Regards,
Jonathan S. Levinson
Senior Software Developer
Object Group
InterSystems
617-621-0600
Comment 6 Jonathan Levinson 2009-09-21 13:29:20 UTC
Created attachment 24298 [details]
Get font short hand completely working

I tested a variety of cases such as:

          <fo:block font="italic small-caps normal 12pt/150% Arial, Helvetica, sans-serif">
        <test:assert property="font-family" expected="[Arial, Helvetica, sans-serif]" />
        <test:assert property="font-size" expected="12000mpt" />
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="ITALIC" />
        <test:assert property="line-height.optimum" expected="18000mpt" />
        <test:assert property="font-variant" expected="SMALL_CAPS" />
        Test font shorthand
      </fo:block> 

     <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
        <test:assert property="font-family" expected="[Arial, Times New Roman, sans-serif]"/>
        <test:assert property="font-size" expected="20736mpt"/>
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="NORMAL" />
        <test:assert property="line-height.optimum" expected="29030mpt" />
        <test:assert property="font-variant" expected="NORMAL" />
        Test font shorthand
      </fo:block>

I think FontShortHandProperty.java now works and does not need to be an LL(k) or LR(k) parser.

The code looks complex but in each case of font property (such as font variant or font family) it is defining a token to be parsed so it is is tokenizing code.  It could probably be refactored to be easier to understand and read.  The code is not parsing but tokenizing.

There is an additional remark I made about this solution (posted before).  The solution seems to have been inadvertently ignored when the tests and patch I made were marked as a duplicate.
Comment 7 Jonathan Levinson 2009-10-05 11:16:00 UTC
Created attachment 24343 [details]
Fix parsing of font-size and line height

I use Java RegEx to parse font-size and line height.

I also (in next patch) add a patch to font shorthand regression test that tests various combinations, including line-height -> space.

I ran unit tests on it and no new failures.

I ran check-style on it and corrected all warnings and errors.
Comment 8 Jonathan Levinson 2009-10-05 11:18:35 UTC
Created attachment 24344 [details]
Add more tests to font-shorthand-test

I add more tests to font-shorthand-test, testing various combinations, including testing whether font-size/line-height can have spaces around the slash and whether line-height -> space will be parsed.
Comment 9 Glenn Adams 2012-04-07 01:45:04 UTC
resetting P2 open bugs to P3 pending further review