Apache OpenOffice (AOO) Bugzilla – Issue 93790
Assertion Error: ImpBreakLine: Start >= End?
Last modified: 2017-05-20 11:33:48 UTC
When loading an Excel document attached to issue 91146 http://www.openoffice.org/nonav/issues/showattachment.cgi/54791/UDT%20Test%20Trackingtoo.zip in DEV300_m29 unxlngi6 non-pro, a gazillion assertions pop up: Error: ImpBreakLine: Start >= End? From File .../svx/source/editeng/impedit3.cxx at Line 1802 Thomas, could you please check whether the ImpBreakLine algorithm fails, or the input data is bad, in which case please reassign to Daniel 'dr', or the breakiterator's getWordBoundary() delivers wrong results, in which case please reassign to Karl 'khong'. Thanks Eike
TL->ER: Well, maybe there is a completely different reason for that... At least I installed DEV300 m29 for Windows and Calc can not even display that file. The first progress bar was fine (thus probably reading the file was Ok), but the second progress bar 'Adapting row heights' got stuck at about 80% and did non finish even after minutes. I'd prefer if someone could have a look at this one first since there is not much use in switching to debugging under Linux since the document might already be broken anyway.
Adding myself to CC list.
@tl: ah, sorry, I omitted that issue 91146 is about Calc nearly hanging when loading that document, which is fixed in CWS odff05 currently based on DEV300_m29 and where I found the assertions. A build for unxlngi6 now exists in Hamburg lab. Either use that, or wait until the long running CWS will be integrated, which will not be the case before November though. http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fodff05 Thanks Eike
Actually there is a small problem with the break iterator. In this specific case the problem is with English formatted dates (e.g 05/14/07), to be more precise when the break iterator is asked for the word boundary at the position of the '/'. Since the '/' is no dictionary word the break iterator looks for the word close to that position... A macro to show the issue looks like this: Sub Main BI = createUnoService("com.sun.star.i18n.BreakIterator") DIM en_US as new com.sun.star.lang.Locale en_US.Language = "en" en_US.Country = "US" BD = Bi.getWordBoundary( "05/14/07", 2, en_US, 2, true ) msgbox "" + BD.startPos + " " + BD.endPos End Sub The result returned by the break iterator is (0, 2). But since the last parameter in the call is true the break iterator should have looked at the following text not at the one before the position in question. Thus the result should have been (3, 5). In that case the assertion would not have been triggered. However the current behavior of the break iterator is that way since at least OOo 2.1 (as can be checked with the macro). And at the specific code in question there will be no harm if this behavior is left unchanged. Thus we could simply adapt the assertion to get rid of it. If on the other hand the problem in the break iterator gets fixed it will be unknown what other source code might be affected by that change. And if the break itrerator changes at the very least the code in impedit3.cxx should be changed to sth. like this: sal_uInt16 nWordStart = nBreakPos; if (nWordStart < aBoundary.startPos) nWordStart = aBoundary.startPos; sal_uInt16 nWordEnd = (USHORT) aBoundary.endPos; Otherwise the word would start with '/' and thus it will be very likely that no hyphenation position can be found even if the word following '/' could be hyphenated.
TL->ER: Please take over again. Thanks!
It is a bug in word breakiterator. For string '05///14/07', set nPos to 3 or 4, it will return '5 7'. but not for nPos=2, which return '0 2'. Dictionary word breakiterator tries to skip non-alphnum chars, but it considers the char right after prev word is part of prev word. I can fix the problem, but I don't know if anyone relies on this bug. If you decide it should be fixed in i18n framework, pass the issue to me.
TL->KH: I just asked FME about this since he in sw is probably the other one who is making most use of the break iterator: We both agreed to leave things as they are since there is no known problem with that issue right now. If it turns out later that there is a problem found that must be fixed we can do it then. However if that case arises please drop us a note since we probably should trigger some more-than-usual QA testing at that point. TL->ER: Thus if we want to get rid of that assertion we will just modify it and add a comment referring to this issue. Shall I do or will you do this? In the first case just hand this one back to me.
Reset assigne to the default "issues@openoffice.apache.org".