Summary: | [PATCH] Complex Script Support | ||
---|---|---|---|
Product: | Fop - Now in Jira | Reporter: | Glenn Adams <gadams> |
Component: | general | Assignee: | fop-dev |
Status: | CLOSED FIXED | ||
Severity: | major | CC: | levinson |
Priority: | P2 | ||
Version: | trunk | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Attachments: |
Patch to FOP trunk revision 981406.
Simple test Arabic input FO file. Simple test Arabic output PDF file. Patch to FOP trunk revision 981406. Simple test Arabic output PDF file. Patch to revision 987110 of branch Temp_ComplexScripts. Patch to revision 1002949 of branch Temp_ComplexScripts. Patch to revision 1076316 of branch Temp_ComplexScripts. Patch to revision 1136002 of branch Temp_ComplexScripts. Patch to revision 1183594 of branch Temp_ComplexScripts list of Gujarati words and sentences Patch to revision 1198396 of trunk. Patch to revision 1236835 of trunk. |
Description
Glenn Adams
2010-08-02 05:36:47 UTC
Created attachment 25826 [details]
Simple test Arabic input FO file.
Created attachment 25827 [details]
Simple test Arabic output PDF file.
Hi Glenn, Thanks for your patch. I've just created a temporary branch for processing it: https://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_ComplexScripts I'll apply your patch to the branch soon (or Simon if he beats me to it). Please create your diff files from this branch from now on. More later. Thanks, Vincent (In reply to comment #3) Hi, I tried to apply patch against the new Vincent's Temp_ComplexScripts branch, but I get the following error during build: compile-java: [mkdir] Created dir: D:\bin\java\fop\Temp_ComplexScripts\build\classes [javac] Compiling 1316 source files to D:\bin\java\fop\Temp_ComplexScripts\build\classes [javac] D:\bin\java\fop\Temp_ComplexScripts\src\java\org\apache\fop\fonts\type1\AFMFile.java:59: generics are not supported in -source 1.4 [javac] (use -source 5 or higher to enable generics) [javac] private List<AFMCharMetrics> charMetrics = new java.util.ArrayList<AFMCharMetrics>(); [javac] ^ [javac] 1 error Pascal You need to create a build-local.properties file at the top-level of FOP tree, containing, e.g., the following: javac.source = 1.5 javac.target = 1.5 javac.fork = on junit.fork = on Note that the 1.5 dependency below was not introduced by this patch. Glenn (In reply to comment #4) > (In reply to comment #3) > > Hi, > I tried to apply patch against the new Vincent's Temp_ComplexScripts branch, > but I get the following error during build: > > > compile-java: > [mkdir] Created dir: D:\bin\java\fop\Temp_ComplexScripts\build\classes > [javac] Compiling 1316 source files to > D:\bin\java\fop\Temp_ComplexScripts\build\classes > [javac] > D:\bin\java\fop\Temp_ComplexScripts\src\java\org\apache\fop\fonts\type1\AFMFile.java:59: > generics are not supported in -source 1.4 > [javac] (use -source 5 or higher to enable generics) > [javac] private List<AFMCharMetrics> charMetrics = new > java.util.ArrayList<AFMCharMetrics>(); > [javac] ^ > [javac] 1 error > > > Pascal Ah, I take that back. It looks like this got inadvertently included from an early merge from trunk of some new AFM code that used generics, but prior to that dependency being backed out. I'll fix this in my next patch to the new branch, that is, unless whoever applies this current patch to the branch takes care of it first. In the mean time, you can use my suggested work around. G. (In reply to comment #5) > > Note that the 1.5 dependency below was not introduced by this patch. Created attachment 25830 [details] Patch to FOP trunk revision 981406. This replacement patch removes the erroneously introduced JDK 1.5 (generics) dependency. I decided not to wait, so here is a replacement patch that fixes the 1.5 dependency. Let me know if there are further build problems on 1.4. G. (In reply to comment #7) > Created an attachment (id=25830) [details] > Patch to FOP trunk revision 981406. > > This replacement patch removes the erroneously introduced JDK 1.5 (generics) > dependency. (In reply to comment #8) Glenn, build is OK, now. I didn't made Junit tests, I have to check my config before. Thx. For further documentation and issue tracking for this patch, see the following: https://skynav.trac.cvsdude.com/fop Regards, Glenn Created attachment 25831 [details]
Simple test Arabic output PDF file.
(In reply to comment #8) Glenn, I do not get Character shaping when applying patch (against either trunk r981406 or Temp_ComplexScripts). The output is the same as FOP 1.0 without patch. I've tested it with Arial UNI (I didn't get Simplified Arabic). -- Pascal (In reply to comment #12) Works fine now (with Glenn help) Thx (In reply to comment #11) > Created an attachment (id=25831) [details] > Simple test Arabic output PDF file. Hi Glenn, Can you please let me know in detail, how this Arabic patch that you have created to be applied and run? Thanks, Dharmesh Rana It would be best to wait until the patch is committed to the Temp_ComplexScripts branch that has been created for it, then you will need to build it prior to running it. You will also need one or more of the (presently) supported fonts, about which see https://skynav.trac.cvsdude.com/fop/wiki/SupportedFonts. I have no control over the schedule for the committing of the patch, as it will have to be performed by an FOP committer. But you can express your interest in having them do that on the following mail list: <fop-user@xmlgraphics.apache.org>. If you are impatient, you can apply the patch yourself, but you will need the appropriate skills to do that. Regards, Glenn (In reply to comment #14) > (In reply to comment #11) > > Created an attachment (id=25831) [details] [details] > > Simple test Arabic output PDF file. > > Hi Glenn, > > Can you please let me know in detail, how this Arabic patch that you have > created to be applied and run? > > Thanks, > Dharmesh Rana Created attachment 25912 [details] Patch to revision 987110 of branch Temp_ComplexScripts. This patch file replaces the prior patch against trunk. It is intended to be applied to revision 987110 of branch Temp_ComplexScripts as follows: $ cd .../xmlgraphics/fop/branches/Temp_ComplexScripts $ svn update -r 987110 . $ gzcat .../fop-i18n.arabic-patch-1-update-1.diff.tgz | patch -p1 $ svn add `svn status . | grep '?' | awk '{print $2;}'` This patch has been verified to build and run junit tests without regression, with 4 compiler warnings (none new), 0 checkstyle warnings, and 971 findbugs bugs (none new). The intent of this patch update is to merge recent changes to the trunk, as well as fix three problems recently reported with the Arabic shape selection and bidi logic. Note well that this patch update does NOT address the following outstanding issues: * support for correct treatment and placement of non-spacing marks * support for certain advanced GSUB table types * support for GPOS table types * support for inline FOs in bidi conexts other than fo:inline, fo:bidi-override, and fo:character * correct treatment of corresponding properties in right-to-left contexts * incorrect justification of arabic text with text-align="justify" These latter issues will be addressed in subsequent patches to the Temp_ComplexScripts branch. Regards, Glenn Committed revision 987282 Hello, we want to test the fix for arabic characters. What is the svn checkout URL for this fop revision in trunk? We've tested with current trunk of 22.09.2010. How can I install the patch? what#s the svn checkout base? Thanks! Jens and Abde Summary of web links ==================== This work is available in a git repository, see http://github.com/skynavga/fop. The repository URLs are: http://github.com/skynavga/fop.git and git://github.com/skynavga/fop.git This work is available on the subversion branch Temp_ComplexScripts. The repository URL is: http://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_ComplexScripts Binary snapshots of this work are available at http://people.apache.org/~spepping/. Updated snapshots will be provided whenever there is a new patch. The git repository is most up to date; the subversion repository waits for the submission of patches. Created attachment 26104 [details] Patch to revision 1002949 of branch Temp_ComplexScripts. This patch is to be applied to revision 1002949 of branch Temp_ComplexScripts as follows: cd ${FOP}/branches/Temp_ComplexScripts svn update -r 1002949 --force gzcat ${DOWNLOADS}/fop-i18n.arabic-patch-2.diff.gz | patch -p0 svn rm src/java/org/apache/fop/fonts/GlyphUtils.java svn add src/java/org/apache/fop/fonts/GlyphSubstitutionState.java svn add src/java/org/apache/fop/fonts/GlyphClassMapping.java svn add src/java/org/apache/fop/fonts/GlyphPositioningState.java svn add src/java/org/apache/fop/fonts/GlyphCoverageMapping.java svn add src/java/org/apache/fop/fonts/GlyphProcessingState.java svn add src/java/org/apache/fop/fonts/GlyphClassTable.java svn add src/java/org/apache/fop/fonts/GlyphDefinitionTable.java svn add src/java/org/apache/fop/fonts/GlyphMappingTable.java svn add src/java/org/apache/fop/fonts/GlyphDefinitionSubtable.java svn add src/java/org/apache/fop/fonts/GlyphDefinition.java svn add src/java/org/apache/fop/fonts/ScriptContextTester.java svn add src/java/org/apache/fop/fonts/GlyphTester.java svn commit ... This patch completes the first round of functional support for OpenType GDEF/GSUB/GPOS advanced typographic tables as employed by selected Arabic fonts. Regards, Glenn Hi! It is work great for whole document. Could you implement similar for <fo:block-container writing-mode="rl-tb"> </fo:block-container> I my project a lot of text contain mixed text (arabic, cyrilic, latin), therefore I have a big problem with FOP. find problem: 1. Background color for inline element does not drawed <fo:inline background-color="#f3b53f" > 2. Possible not in right place if in fop.xconf add <auto-detect/> in section <fonts> SEVERE: SVG graphic could not be rendered. Reason: java.lang.UnsupportedOperationException: unsupported device table del ta format: 0 java.lang.UnsupportedOperationException: unsupported device table delta format: 0 at org.apache.fop.fonts.truetype.TTFFile.readPosDeviceTable(TTFFile.java:3239) at org.apache.fop.fonts.truetype.TTFFile.readPosAnchor(TTFFile.java:3588) at org.apache.fop.fonts.truetype.TTFFile.readMarkToBasePosTableFormat1(TTFFile.java:3711) at org.apache.fop.fonts.truetype.TTFFile.readMarkToBasePosTable(TTFFile.java:3766) at org.apache.fop.fonts.truetype.TTFFile.readGPOSSubtable(TTFFile.java:4649) at org.apache.fop.fonts.truetype.TTFFile.readLookupTable(TTFFile.java:4721) Dear all, I was using FOP for more than 1 year and it's a great tool for printing and creating PDF files and now I have problem printing ARABIC Text. I downloaded and tested the FO-PDF converted and it rendered the text correctly, but this does not fix my problem since I was using the PrintRenderer to directly print an FO file. My question is, when can I expect a new patch or release of FOP with Arabic print-rendering support. Best regards, Imad Daou I cannot determine from your email whether you are using this new patch or not. In any case, only the IF, AT, and PDF renderers are presently supported by this patch, and I do not have a schedule for when the remaining renderers will be supported. Regards, Glenn (In reply to comment #23) > Dear all, > > I was using FOP for more than 1 year and it's a great tool for printing and > creating PDF files and now I have problem printing ARABIC Text. > > I downloaded and tested the FO-PDF converted and it rendered the text > correctly, but this does not fix my problem since I was using the PrintRenderer > to directly print an FO file. > > My question is, when can I expect a new patch or release of FOP with Arabic > print-rendering support. > > Best regards, > Imad Daou I cannot determine from your email whether you are using this new patch or not. In any case, only the IF, AT, and PDF renderers are presently supported by this patch, and I do not have a schedule for when the remaining renderers will be supported. Regards, Glenn (In reply to comment #23) > Dear all, > > I was using FOP for more than 1 year and it's a great tool for printing and > creating PDF files and now I have problem printing ARABIC Text. > > I downloaded and tested the FO-PDF converted and it rendered the text > correctly, but this does not fix my problem since I was using the PrintRenderer > to directly print an FO file. > > My question is, when can I expect a new patch or release of FOP with Arabic > print-rendering support. > > Best regards, > Imad Daou Dear Glenn, thank you for this patch. I am testing it and have following problem: Arabic letter in the words looks not connected together (separated), can you please explain this behavior ? Best regards, Virginijus (In reply to comment #26) > Dear Glenn, > > thank you for this patch. > > I am testing it and have following problem: Arabic letter in the words looks > not connected together (separated), can you please explain this behavior ? > > Best regards, > > Virginijus please use the i18n.arabic branch at git://github.com/skynavga/fop.git for the latest code; also, please restrict your fonts to those specified described under "Supported Fonts" at https://skynav.trac.cvsdude.com/fop/wiki/ComplexScripts regards, glenn Created attachment 26721 [details] Patch to revision 1076316 of branch Temp_ComplexScripts. This patch is to be applied to revision 1076316 of branch Temp_ComplexScripts as follows: cd ${FOP}/branches/Temp_ComplexScripts svn update -r 1076316 --force svn revert -R . svn status # check and remove any unexpected changes prior to patching gzcat ${DOWNLOADS}/fop-i18n.arabic-patch-3.diff.gz | patch -p0 ant codegen-unicode-bidi # needed to generate test data to be committed svn add test/java/org/apache/fop/complexscripts svn add src/java/org/apache/fop/fonts/AdvancedTypographicTableFormatException.java svn add src/codegen/complexscripts svn add src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java svn commit ... This patch completes the next update of support for OpenType GDEF/GSUB/GPOS advanced typographic tables as employed by selected Arabic fonts. This patch includes the following: * bug fixes * new codegen-unicode-bidi target (used to generate bidi test data above) * new junit-complexscripts target N.B. the use of "ant codegen-unicode-bidi" above, which is needed to generate test data to be committed. Regards, Glenn (In reply to comment #28) > Created an attachment (id=26721) [details] > Patch to revision 1076316 of branch Temp_ComplexScripts. Thanks. Committed. I note the following problem: In the build target junit-compile-java, the attribute value memoryMaximumSize="1024m" was added, but the ant build reports: [javac] Since fork is false, ignoring memoryMaximumSize setting. Simon Glenn, I had a brief look at your branch and have the following questions and comments: You implemented the BIDI algorithm in a separate BidiUtil class. Why didn't you integrate it into the layout engine? It seems that what is done there is a layout task, therefore should be put in the layout engine classes. FO tree objects are now manipulated by both the layout engine and that class, and I'm seriously concerned about the maintenance issues that that may create. Also, AFAIU a whole data structure is created there that will exist in parallel with the Knuth elements created by the layout engine. In addition to the memory issues, isn't there a risk of discrepancy between the two? That may cause hard-to-find bugs. How feasible is it to run the BIDI algorithm on a subset of the FO tree? If I'm right, ATM it is launched on a whole page-sequence. When we refactor the code so that layout can be started before a whole page-sequence has been parsed, in order to avoid the infamous memory issue that a lot of users are running into, will that code allow to do it? I have a concern with some metrics. That BidiUtil class is more than 1700 lines. There are a few other new classes that are more than 1000 lines long, and TTFFile now is more than 5500 lines (!). What's your plan to break them down into more manageable chunks? Also, there are now 72 classes in the o.a.f.fonts package. Also, many variables have a two- or three-letter name, which makes it difficult to understand what their purpose is, especially if a lot of them are involved at the same time. I realise there usually is a comment explaining what the variable does when it's defined, but it would be easier if the variable were carrying the comment with itself by having a longer name. Thanks, Vincent inline (In reply to comment #30) > Glenn, > > I had a brief look at your branch and have the following questions and > comments: > > You implemented the BIDI algorithm in a separate BidiUtil class. Why didn't you > integrate it into the layout engine? It seems that what is done there is a > layout task, therefore should be put in the layout engine classes. FO tree > objects are now manipulated by both the layout engine and that class, and I'm > seriously concerned about the maintenance issues that that may create. BidiUtil is defined in o.a.f.layoutmgr, therefore it is "integrated into the layout engine". I considered placing it in o.a.f.util or in o.a.f.text.bidi; however, since it is only referenced by o.a.f.layoutmgr code, I decided that was the best home for it at this time. However, having said that, it is not performing layout and knows nothing about areas or geometries whatsoever. It's functionality is invoked in two places: (1) in PageSequenceLayoutManager.activateLayout(), in order to resolve bidi levels of each delimited text range, and (2) in LineLayoutManager.add{Inline,Block}Area, after completing line area construction; See XSL-FO Section 5.8 for more information. I'm open to any concrete suggestions about a better point of integration, but I don't see any at present that is consistent with 5.8. > Also, AFAIU a whole data structure is created there that will exist in parallel > with the Knuth elements created by the layout engine. In addition to the memory > issues, isn't there a risk of discrepancy between the two? That may cause > hard-to-find bugs. There is no relationship between the bidi levels created by BidiUtil and Knuth elements used for line breaking. > > How feasible is it to run the BIDI algorithm on a subset of the FO tree? If I'm > right, ATM it is launched on a whole page-sequence. When we refactor the code > so that layout can be started before a whole page-sequence has been parsed, in > order to avoid the infamous memory issue that a lot of users are running into, > will that code allow to do it? I'm not sure what you mean by "ATM". The semantics of XSL-FO 5.8 as embodied by the present implementation will have to be taken into account by such a refactoring, as I'm sure will many other aspects of the current FOP implementation. > I have a concern with some metrics. That BidiUtil class is more than 1700 > lines. There are a few other new classes that are more than 1000 lines long, > and TTFFile now is more than 5500 lines (!). What's your plan to break them > down into more manageable chunks? Also, there are now 72 classes in the > o.a.f.fonts package. I have no such plan in the near to medium term. There is no technical reason to refactor TTFFile, though I agree it is on the long side, and could probably use a general refactoring in the long term. > Also, many variables have a two- or three-letter name, which makes it difficult > to understand what their purpose is, especially if a lot of them are involved > at the same time. I realise there usually is a comment explaining what the > variable does when it's defined, but it would be easier if the variable were > carrying the comment with itself by having a longer name. Symbol length is a trade off between readability and other factors, such as line length and typing time. I prefer short symbols. Perhaps it is my training in mathematics, electrical engineering, and having started programming with APL that leads me in that direction. The meaning of short symbols should be readily inferable given sufficient knowledge of the problem domain. Best, Glenn I just tried checking it out, and got two types of compile error when trying to build: [javac] ... org/apache/fop/layoutmgr/BidiUtil.java:42: cannot find symbol [javac] symbol : class Viewport [javac] location: package org.apache.fop.area.inline [javac] import org.apache.fop.area.inline.Viewport; [javac] ^ [javac] ... org/apache/fop/layoutmgr/BidiUtil.java:107: reference to collectRuns is ambiguous, both method collectRuns(java.util.List,java.util.List) in org.apache.fop.layoutmgr.BidiUtil and method collectRuns(Viewport,java.util.List) in org.apache.fop.layoutmgr.BidiUtil match [javac] List runs = collectRuns ( la.getInlineAreas(), new Vector() ); [javac] ^ ... [javac] 13 errors Replacing all references with InlineViewport fixes the problem (committed in rev 1134414) First look: hope the CS suppressions aren't there to stay... ;-P (more detailed review to follow) I believe this is a side-effect of Simon's incremental merges from trunk into the Temp_ComplexScripts branch. I'm doing my work in my dev repo at git://github.com/skynavga/fop.git, from which I irregularly post a new patch for merger into Temp_ComplexScripts. I've got a couple of bugs I'm working on, after which I expect to submit a new patch before the end of June. Regarding CS suppressions, I expect they will remain in the source I authored since I use somewhat different style conventions than were used by the pre-existing code. For example, I use: * longer lines * a few methods with greater than 7 parameters * slightly different spacing conventions When I've modified pre-existing files, then I kept to the existing checkstyle conventions; however, with some new files, I have varied a bit. I don't have any plans to change those new files at this time. This does not affect functionality. G. (In reply to comment #32) > I just tried checking it out, and got two types of compile error when trying to > build: > > [javac] ... org/apache/fop/layoutmgr/BidiUtil.java:42: cannot find symbol > [javac] symbol : class Viewport > [javac] location: package org.apache.fop.area.inline > [javac] import org.apache.fop.area.inline.Viewport; > [javac] ^ > > [javac] ... org/apache/fop/layoutmgr/BidiUtil.java:107: reference to > collectRuns is ambiguous, both method > collectRuns(java.util.List,java.util.List) in org.apache.fop.layoutmgr.BidiUtil > and method collectRuns(Viewport,java.util.List) in > org.apache.fop.layoutmgr.BidiUtil match > [javac] List runs = collectRuns ( la.getInlineAreas(), new Vector() > ); > [javac] ^ > ... > [javac] 13 errors > > Replacing all references with InlineViewport fixes the problem (committed in > rev 1134414) > > First look: hope the CS suppressions aren't there to stay... ;-P > > (more detailed review to follow) (In reply to comment #31) > (In reply to comment #30) > > I had a brief look at your branch and have the following questions and > > comments: > > > > You implemented the BIDI algorithm in a separate BidiUtil class. Why didn't you > > integrate it into the layout engine? It seems that what is done there is a > > layout task, therefore should be put in the layout engine classes. FO tree > > objects are now manipulated by both the layout engine and that class, and I'm > > seriously concerned about the maintenance issues that that may create. > > BidiUtil is defined in o.a.f.layoutmgr, therefore it is "integrated into the > layout engine". I considered placing it in o.a.f.util or in o.a.f.text.bidi; > however, since it is only referenced by o.a.f.layoutmgr code, I decided that > was the best home for it at this time. Just to weigh in here: On the one hand, I believe that the layout *managers* should actually remain quite BIDI-agnostic. Ultimately, to a layout manager, it should only matter how wide/heigh a given text-fragment will be and how its own child areas should be stacked. In what direction text should be rendered, is hardly its concern. The TextLM, maybe... but that is already quite large. That said, Vincent's point is a valid one. The observation certainly does not mean that more LMs shouldn't *use* the BIDI functionality. Almost on the contrary... Thinking out loud, maybe a singleton that is made available in the top level LM would be more appropriate. That would allow potential optimizations, in that BIDI is then not applied to a page-sequence as a whole, but gradually, as layout progresses. > However, having said that, it is not performing layout and knows nothing about > areas or geometries whatsoever. It's functionality is invoked in two places: > > (1) in PageSequenceLayoutManager.activateLayout(), in order to resolve bidi > levels of each delimited text range, and > (2) in LineLayoutManager.add{Inline,Block}Area, after completing line area > construction; > > See XSL-FO Section 5.8 for more information. I'm open to any concrete > suggestions about a better point of integration, but I don't see any at present > that is consistent with 5.8. I am even thinking that the first part (resolving embedding levels) can be handled entirely in the FO tree. Places like finalizeNode() come to mind. That way, RTF output (or more generally: formats not using the AreaTreeHandler and the layout engine) would also be able to benefit from it. Also, I notice 'blind' traversal of the ancestry of a node. 'Blind' means that I am wondering what happens with BIDI processing in retrieved fo:markers. The spec is not entirely clear about it, but given that writing-mode is involved, I would expect the behavior to be similar to property-resolution, where the base context comes from the parent of the fo:retrieve-marker, not the actual parent node in the source document. Integrating the step in the FO tree would make this relatively straightforward to solve. The first step would be skipped during initial parsing, and instead executed as part of the resolveRetrieveMarker() logic that is invoked during layout of static content. <snip /> > > How feasible is it to run the BIDI algorithm on a subset of the FO tree? If I'm > > right, ATM it is launched on a whole page-sequence. When we refactor the code > > so that layout can be started before a whole page-sequence has been parsed, in > > order to avoid the infamous memory issue that a lot of users are running into, > > will that code allow to do it? > > I'm not sure what you mean by "ATM". The semantics of XSL-FO 5.8 as embodied by > the present implementation will have to be taken into account by such a > refactoring, as I'm sure will many other aspects of the current FOP > implementation. XSL-FO doesn't go so far as specifying that this all has to be applied to the page-sequence as a whole. That would be an implementation detail. As long as the result is compliant, it shouldn't matter whether, architecturally, we process complete page-sequences or separate blocks. I still firmly believe there is a possibility to switch from endPageSequence() to endBlock(). Only blocks that are direct children of a fo:flow should be used as logical boundaries to trigger layout (and, if possible, rendering) of all content up to that point. The proposed approach should work too, preferably without too much refactoring, as it would be the idea of purging all finished blocks. (In reply to comment #34) > (In reply to comment #31) > > (In reply to comment #30) > > > I had a brief look at your branch and have the following questions and > > > comments: > > > > > > You implemented the BIDI algorithm in a separate BidiUtil class. Why didn't you > > > integrate it into the layout engine? It seems that what is done there is a > > > layout task, therefore should be put in the layout engine classes. FO tree > > > objects are now manipulated by both the layout engine and that class, and I'm > > > seriously concerned about the maintenance issues that that may create. > > > > BidiUtil is defined in o.a.f.layoutmgr, therefore it is "integrated into the > > layout engine". I considered placing it in o.a.f.util or in o.a.f.text.bidi; > > however, since it is only referenced by o.a.f.layoutmgr code, I decided that > > was the best home for it at this time. > > Just to weigh in here: > On the one hand, I believe that the layout *managers* should actually remain > quite BIDI-agnostic. Ultimately, to a layout manager, it should only matter how > wide/heigh a given text-fragment will be and how its own child areas should be > stacked. In what direction text should be rendered, is hardly its concern. The > TextLM, maybe... but that is already quite large. One point where LMs are exposed to bidi is that the rendering system assumes a left to right rendering direction of areas irrespective of bidi, while start/end edges vary according to bidi context. For example, the start/end indent traits on generated areas varies with bidi levels. Also, certain progression directions, such as table column progression, list label versus list body, region start versus region end, etc., all vary by bidi level. In some cases this variation must be accounted for in an LM. > That said, Vincent's point is a valid one. The observation certainly does not > mean that more LMs shouldn't *use* the BIDI functionality. Almost on the > contrary... > > Thinking out loud, maybe a singleton that is made available in the top level LM > would be more appropriate. That would allow potential optimizations, in that > BIDI is then not applied to a page-sequence as a whole, but gradually, as > layout progresses. > > > However, having said that, it is not performing layout and knows nothing about > > areas or geometries whatsoever. It's functionality is invoked in two places: > > > > (1) in PageSequenceLayoutManager.activateLayout(), in order to resolve bidi > > levels of each delimited text range, and > > (2) in LineLayoutManager.add{Inline,Block}Area, after completing line area > > construction; > > > > See XSL-FO Section 5.8 for more information. I'm open to any concrete > > suggestions about a better point of integration, but I don't see any at present > > that is consistent with 5.8. > > I am even thinking that the first part (resolving embedding levels) can be > handled entirely in the FO tree. Places like finalizeNode() come to mind. That > way, RTF output (or more generally: formats not using the AreaTreeHandler and > the layout engine) would also be able to benefit from it. > Also, I notice 'blind' traversal of the ancestry of a node. 'Blind' means that > I am wondering what happens with BIDI processing in retrieved fo:markers. The > spec is not entirely clear about it, but given that writing-mode is involved, I > would expect the behavior to be similar to property-resolution, where the base > context comes from the parent of the fo:retrieve-marker, not the actual parent > node in the source document. > Integrating the step in the FO tree would make this relatively straightforward > to solve. The first step would be skipped during initial parsing, and instead > executed as part of the resolveRetrieveMarker() logic that is invoked during > layout of static content. > > <snip /> > > > > How feasible is it to run the BIDI algorithm on a subset of the FO tree? If I'm > > > right, ATM it is launched on a whole page-sequence. When we refactor the code > > > so that layout can be started before a whole page-sequence has been parsed, in > > > order to avoid the infamous memory issue that a lot of users are running into, > > > will that code allow to do it? > > > > I'm not sure what you mean by "ATM". The semantics of XSL-FO 5.8 as embodied by > > the present implementation will have to be taken into account by such a > > refactoring, as I'm sure will many other aspects of the current FOP > > implementation. > > XSL-FO doesn't go so far as specifying that this all has to be applied to the > page-sequence as a whole. That would be an implementation detail. As long as > the result is compliant, it shouldn't matter whether, architecturally, we > process complete page-sequences or separate blocks. I still firmly believe > there is a possibility to switch from endPageSequence() to endBlock(). Only > blocks that are direct children of a fo:flow should be used as logical > boundaries to trigger layout (and, if possible, rendering) of all content up to > that point. The proposed approach should work too, preferably without too much > refactoring, as it would be the idea of purging all finished blocks. It may be readily feasible to change the bidi level resolution process from end of page sequence to end of block. I'll give that a try to see what it breaks, and if it doesn't break anything that can't be readily fixed, i'll move the point of resolution invocation to end block processing. G. Created attachment 27161 [details] Patch to revision 1136002 of branch Temp_ComplexScripts. This patch is to be applied to revision 1136002 of branch Temp_ComplexScripts as follows: cd ${FOP}/branches/Temp_ComplexScripts svn update -r 1136002 --force svn revert -R . svn status # check and remove any unexpected changes prior to patching gzcat ${DOWNLOADS}/fop-i18n.arabic-patch-4.diff.gz | patch -p0 svn add test/java/org/apache/fop/complexscripts/arabic svn add test/java/org/apache/fop/complexscripts/scripts svn add test/java/org/apache/fop/complexscripts/fonts svn add test/java/org/apache/fop/complexscripts/util svn add test/layoutengine/standard-testcases/page-number-citation_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/inline_background-color_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-lr.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_display-align_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_for_toc_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-line_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_baseline-shift_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_inline-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_basic_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/character_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_border_padding_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/list-block_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl_2.xml svn add test/layoutengine/standard-testcases/block_text-align_4.xml svn add test/layoutengine/standard-testcases/basic-link_background-image_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/block_text-align_5.xml svn add test/resources/complexscripts svn add src/java/org/apache/fop/fonts/IndicScriptProcessor.java svn add src/java/org/apache/fop/fonts/DevanagariScriptProcessor.java svn rm --force src/codegen/complexscripts svn commit ... This patch includes the following: * bug fixes * new layout engine tests for right-to-left writing mode * new generic indic script processing support * new devanagari script processing support See milestone "Patch 4" at http://skynav.trac.cvsdude.com/fop/report/6 for further details. Regards, Glenn Committed in revisions 1136144 and 1136146. In rev. 1136144 I forgot the additions and deletions, so that the code of that revision does not compile. The code contains a number of Unicode properties, e.g. in util.CharUtilities and DevanagariScriptProcessor. It would be better if all Unicode properties would be collected in one class. Perhaps it is even possible to generate that class from the Unicode Database files. You already do that with class BidiClassUtils. This would make it much easier to update FOP to newer versions of Unicode. Some of the info in util.CharUtilities is also provided by Character.UnicodeBlock.of(int codePoint). Why do you not use Java's Unicode support or ICU? (In reply to comment #38) > The code contains a number of Unicode properties, e.g. in util.CharUtilities > and DevanagariScriptProcessor. It would be better if all Unicode properties > would be collected in one class. Perhaps it is even possible to generate that > class from the Unicode Database files. You already do that with class > BidiClassUtils. This would make it much easier to update FOP to newer versions > of Unicode. The Unicode Database files do not have the specialized properties needed to perform script the script processing found in DevanagariScriptProcessor, etc. Furthermore, there are context sensitive properties that are not simple table lookup. > Some of the info in util.CharUtilities is also provided by > Character.UnicodeBlock.of(int codePoint). Why do you not use Java's Unicode > support or ICU? Neither Java's Unicode support or ICU are sufficient to provide the necessary logic, e.g., determination of script membership for the purpose of performing complex script rendering. To give an example, script membership is not the same as unicode block membership. G. Created attachment 27784 [details] Patch to revision 1183594 of branch Temp_ComplexScripts This patch is to be applied to revision 1183594 of branch Temp_ComplexScripts as follows: cd ${FOP}/branches/Temp_ComplexScripts svn update -r 1183594 --force svn revert -R . svn status # check and remove any unexpected changes prior to patching gzcat ${DOWNLOADS}/fop-i18n.arabic-patch-5.diff.gz | patch -p0 ant resgen-complexscripts clean svn add test/java/org/apache/fop/complexscripts/gsub svn add test/java/org/apache/fop/complexscripts/gdef svn add test/java/org/apache/fop/complexscripts/gpos svn add test/java/org/apache/fop/complexscripts/arabic/GenerateArabicTestData.java svn add test/java/org/apache/fop/complexscripts/arabic/ArabicTestCase.java svn add test/java/org/apache/fop/complexscripts/arabic/ArabicTestConstants.java svn add test/java/org/apache/fop/layoutengine/LayoutEngineTestUtils.java svn add test/java/org/apache/fop/util/NumberConverterTestCase.java svn add test/layoutengine/standard-testcases/region-body_column-count_1_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/page-sequence-force-page-count-odd.xml svn add test/layoutengine/standard-testcases/region-body_column-count_2_writing-mode_rl.xml svn add test/resources/complexscripts/arab/data svn propdel svn:mime-type test/resources/complexscripts/arab/data/arab-001.txt svn add src/java/org/apache/fop/util/NumberConverter.java svn rm --force test/java/org/apache/fop/complexscripts/arabic/ArabicScriptTestSuite.java svn commit ... This patch includes the following: * bug fixes * new layout engine tests for right-to-left writing mode * new test utility for using TTX files in testing advanced typographic tables * new test cases for GDEF/GSUB/GPOS advanced typographic tables * new test cases for ~85000 arabic word forms against four fonts * new implementation of number formatter for number to string conversion, which adds support for arabic, hebrew, thai, and kana scripts See milestone "Patch 5" at http://skynav.trac.cvsdude.com/fop/report/6 for further details. Regards, Glenn Applied. I had problems with test/resources/complexscripts/arab/data/. Please, check the result. Created attachment 27822 [details]
list of Gujarati words and sentences
As per my exchange with Glenn, I've attached a UTF-8 encoded file that contains Gujarati words and sentences. Actually, it's an output of a multiple choice quiz from my application. If the data is expected in some other form, let me know. Also, let me know if you need more of such data.
(In reply to comment #43) > Created attachment 27822 [details] > list of Gujarati words and sentences > > As per my exchange with Glenn, I've attached a UTF-8 encoded file that contains > Gujarati words and sentences. Actually, it's an output of a multiple choice > quiz from my application. If the data is expected in some other form, let me > know. Also, let me know if you need more of such data. Thanks. I'll let you know when I've got the Gujarati support working and have tried out these word forms. By the way, for Arabic script, i have approximately 85,000 word forms which represents a significant cross section of a number of corpuses. I would hope to have similar number of word forms for other scripts. I prefer word forms only for Indic scripts rather than phrases or sentences, since the latter do not typically use any whitespace between words. So I might ask you to manually segment your data into word forms only. G. Created attachment 27906 [details] Patch to revision 1198396 of trunk. Re: Created attachment 27906 [details] Patch to revision 1198396 of trunk. This patch is to be applied to revision 1198396 of trunk as follows: cd ${FOP}/trunk svn update -r 1198396 --force svn revert -R . svn status # check and remove any unexpected changes prior to patching gzcat ${DOWNLOADS}/fop-complexscripts-trunk-patch-1.diff.gz | patch -p1 ant clean ant codegen-unicode-bidi ant resgen-complexscripts ant clean svn add src/codegen/unicode/java/org/apache/fop/text/bidi svn add src/java/org/apache/fop/fonts/AdvancedTypographicTableFormatException.java svn add src/java/org/apache/fop/fonts/ArabicScriptProcessor.java svn add src/java/org/apache/fop/fonts/DefaultScriptProcessor.java svn add src/java/org/apache/fop/fonts/DevanagariScriptProcessor.java svn add src/java/org/apache/fop/fonts/DiscontinuousAssociationException.java svn add src/java/org/apache/fop/fonts/GlyphClassMapping.java svn add src/java/org/apache/fop/fonts/GlyphClassTable.java svn add src/java/org/apache/fop/fonts/GlyphContextTester.java svn add src/java/org/apache/fop/fonts/GlyphCoverageMapping.java svn add src/java/org/apache/fop/fonts/GlyphCoverageTable.java svn add src/java/org/apache/fop/fonts/GlyphDefinition.java svn add src/java/org/apache/fop/fonts/GlyphDefinitionSubtable.java svn add src/java/org/apache/fop/fonts/GlyphDefinitionTable.java svn add src/java/org/apache/fop/fonts/GlyphMappingTable.java svn add src/java/org/apache/fop/fonts/GlyphPositioning.java svn add src/java/org/apache/fop/fonts/GlyphPositioningState.java svn add src/java/org/apache/fop/fonts/GlyphPositioningSubtable.java svn add src/java/org/apache/fop/fonts/GlyphPositioningTable.java svn add src/java/org/apache/fop/fonts/GlyphProcessingState.java svn add src/java/org/apache/fop/fonts/GlyphSequence.java svn add src/java/org/apache/fop/fonts/GlyphSubstitution.java svn add src/java/org/apache/fop/fonts/GlyphSubstitutionState.java svn add src/java/org/apache/fop/fonts/GlyphSubstitutionSubtable.java svn add src/java/org/apache/fop/fonts/GlyphSubstitutionTable.java svn add src/java/org/apache/fop/fonts/GlyphSubtable.java svn add src/java/org/apache/fop/fonts/GlyphTable.java svn add src/java/org/apache/fop/fonts/GlyphTester.java svn add src/java/org/apache/fop/fonts/IncompatibleSubtableException.java svn add src/java/org/apache/fop/fonts/IndicScriptProcessor.java svn add src/java/org/apache/fop/fonts/Positionable.java svn add src/java/org/apache/fop/fonts/ScriptContextTester.java svn add src/java/org/apache/fop/fonts/ScriptProcessor.java svn add src/java/org/apache/fop/fonts/Substitutable.java svn add src/java/org/apache/fop/layoutmgr/BidiUtil.java svn add src/java/org/apache/fop/text/bidi svn add src/java/org/apache/fop/traits/Direction.java svn add src/java/org/apache/fop/traits/WritingMode.java svn add src/java/org/apache/fop/traits/WritingModeTraits.java svn add src/java/org/apache/fop/traits/WritingModeTraitsGetter.java svn add src/java/org/apache/fop/traits/WritingModeTraitsSetter.java svn add src/java/org/apache/fop/util/BidiConstants.java svn add src/java/org/apache/fop/util/NumberConverter.java svn add test/java/org/apache/fop/complexscripts svn add test/java/org/apache/fop/util/NumberConverterTestCase.java svn add test/layoutengine/standard-testcases/basic-link_background-image_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl_2.xml svn add test/layoutengine/standard-testcases/basic-link_for_toc_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_baseline-shift_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_inline-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-line_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/block_text-align_4.xml svn add test/layoutengine/standard-testcases/block_text-align_5.xml svn add test/layoutengine/standard-testcases/character_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/inline_background-color_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_basic_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_border_padding_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_display-align_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/list-block_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/page-number-citation_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/region-body_column-count_1_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/region-body_column-count_2_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-lr.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-rl.xml svn add test/resources/complexscripts svn commit ... This patch includes the following: * merge from Temp_ComplexScripts branch as updated to current trunk changes * bug fix for bidi content inside fo:static-content Regards, Glenn Comment on attachment 27906 [details] Patch to revision 1198396 of trunk. This patch should no longer be used. Most all complex script code has been extensively reorganized into a distinct package, org.apache.fop.complexscripts, now available at http://github.com/skynavga/fop (see the i18n branch) It is uncertain as to if or when this work will be merged into the FOP for future release by Apache. In the mean time, I expect to continue development of new features and bug fixes of complex scripts and other I18N related features in the repository listed above. I will also be regularly merging from the Apache FOP trunk into this repository in order to keep it closely aligned to the main Apache FOP development branch. Created attachment 28220 [details] Patch to revision 1236835 of trunk. This patch is to be applied to revision 1236835 of trunk as follows: cd ${FOP}/trunk svn status # check and remove any unexpected changes prior to patchingsvn update -r 1236835 --force svn revert -R . svn status # check and remove any unexpected changes prior to patching gzcat ${DOWNLOADS}/fop-complexscripts-patch-2.diff.gz | patch -p1 ant clean ant codegen-unicode-bidi ant resgen-complexscripts ant clean svn add src/codegen/unicode/java/org/apache/fop/complexscripts svn add src/java/org/apache/fop/complexscripts svn add src/java/org/apache/fop/traits/Direction.java svn add src/java/org/apache/fop/traits/WritingMode.java svn add src/java/org/apache/fop/traits/WritingModeTraits.java svn add src/java/org/apache/fop/traits/WritingModeTraitsGetter.java svn add src/java/org/apache/fop/traits/WritingModeTraitsSetter.java svn add test/java/org/apache/fop/complexscripts svn add test/layoutengine/standard-testcases/basic-link_background-image_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_external-destination_writing-mode_rl_2.xml svn add test/layoutengine/standard-testcases/basic-link_for_toc_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_baseline-shift_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_inline-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-child_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_multi-line_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_height_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-next-page_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-after_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before.xml svn add test/layoutengine/standard-testcases/basic-link_internal-desination-same-page-before_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/block-container_start-indent_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/block_text-align_4.xml svn add test/layoutengine/standard-testcases/block_text-align_5.xml svn add test/layoutengine/standard-testcases/character_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/inline_background-color_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_basic_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_border_padding_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/instream-foreign-object_display-align_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/leader-alignment_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/leader_leader-pattern_dots_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/leader_leader-pattern_rule_writing-system_rl.xml svn add test/layoutengine/standard-testcases/leader_leader-pattern_space_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/leader_leader-pattern_use-content_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/list-block_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/page-number-citation_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/region-body_column-count_1_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/region-body_column-count_2_writing-mode_rl.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-lr.xml svn add test/layoutengine/standard-testcases/table-column_column-progression-rl.xml svn add test/resources/complexscripts svn propdel svn:mime-type test/resources/complexscripts/arab/data/arab-001.txt svn commit ... This patch replaces (obsoletes) all previous patches on this bug and includes the following: Bug Fixes * fo:block-container start-indent in bidi context (http://skynav.trac.cvsdude.com/fop/ticket/64) * fo:static-content fails to process bidirectional levels (http://skynav.trac.cvsdude.com/fop/ticket/70) * ignores standard kerning when font has GSUB but not GPOS (http://skynav.trac.cvsdude.com/fop/ticket/71) * fo:leader not fully supported in bidi context (http://skynav.trac.cvsdude.com/fop/ticket/74) New Features * preliminary support for gujarati script (http://skynav.trac.cvsdude.com/fop/ticket/48) * preliminary support for gurmukhi script (http://skynav.trac.cvsdude.com/fop/ticket/49) Refactoring and Cleanup * move new complex script related code into separate package hierarchy at o.a.f.complexscripts - the only exception is the retention of the four classes o.a.f.traits.WritingMode*.java * refactor large complex script related classes into smaller classes, e.g., BidiUtil.java has been divided into 6 classes * migrate advanced typography table parsing from TTFFile to o.a.f.complexscripts.fonts.OTFAdvancedTypographicTableReader * migrate FO node specific functionality into FO class hierarchy, e.g., see FONode.collectDelimitedTextRanges * migrate area tree node specific functionality into area class hierarchy, e.g., see InlineArea.collectInlineRuns Regards, Glenn Added complex script support (bidi, shaping, etc) at revision 1293736. |