Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||. (dot) should be word separator|
|Component:||code||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P4||CC:||issues, stefan.baltzer, thomas.lange|
|Version:||OOo 1.1 Beta2|
|Issue Type:||ENHANCEMENT||Latest Confirmation in:||---|
Description maccy 2003-06-11 18:32:01 UTC
I get lots of "words" like "z.B" ("z.B." is an abreviation for "for example" in German) from other users sent in their user dictionaries. OO.o only sees ". " as a word separator but not a dot which isn't followed by a blank. This should be changed IMHO.
Comment 1 khendricks 2003-06-16 16:12:12 UTC
Hi, No, the dictionary author should change their dictionary to include abbreviations. The whole purpose of including embedded dots in a word is to allow abbreviations to be spellchecked properly. This works the same way under English (i.e. or e.g.) And you can ask that the abbreviation be included in the dictionary so that it is properly spellechecked. Resolving this as works for me. Kevin
Comment 2 maccy 2003-06-16 16:29:02 UTC
you are joking, aren't you? This is the very first spell checker/word processor which does it this way and it's very annoying. If the users would type typrographically correct, there would have to be a "half space" in between the abbreviations. This is at least for German type setting rules correct. No Space is wrong (but MANY people do it this way), a full space would be correct for a typewriter whichhas no half space but a half space is the only correct glyph to put at that place; that is, why I will never add abreviations like "i.d.R." or "z.B." into the dictionary.
Comment 3 khendricks 2003-06-16 16:43:50 UTC
Hi, Sorry, abbreviations are allowed in English words and in many other languages and so imbedded periods are possible (i.e. or e.g.) So either you can choose to add a space after letter of an abbreviation to get it to parse properly or you can choose to ignore them but either way periodsin the middle of a string of letters can not be a universal word separator. Perhaps you can convince someone to make it a locale depdendent separator. Either way, this issue is not lingucomponent and belongs instead in the sw.openoffice.org since they control the breakiterator code that determines what gets sent to the spellchecker in the first place. So I am reassigning it to writer so that you can argue with Hamburg who are the breakiterator authors for a locale dependent use of period as a full break in German. Good Luck! Kevin
Comment 4 h.ilter 2003-06-17 09:36:43 UTC
Reassigned to SBA.
Comment 5 stefan.baltzer 2003-06-17 13:59:41 UTC
SBA: The word count of URLs is another subject to keep in mind when discussing what a dot should do. How many words is "Me.firstname.lastname@example.org" expected to be...? To me, an URL is ONE word, regardless the number of dots within. I think this is a far more important issue than the counting of abbreviations. Texts with abbreviations require the user to know them in order to be understood while URLs can not be avoided. Those in the deepest need for exact word counts are those who get paid by the word: Journalists, editors and the like. There must be a kind of rule for them when to count +1 and when not. I am not aware of the rules journalists get paid. Generally spoken, we don't want to end up in pumping up the number of options. There are already far too many, so we have to be very strict on that. Reassigned to Michael.
Comment 6 maccy 2003-06-17 15:39:56 UTC
Stefan, you are just talking abount word counts, which is quite sencondary compared to the annoying spelling mistakes OO.o comes up with. And the spelling mistakes due to wrong word counts is what this bug is all about. Taking your example: The string "Me.email@example.com" should be handed over by OO.o to the spell checker in this chunks: Me and my f r ... bla com OO.o should not hand over the whole string at once to the spellchecker. I don't care what OO.o does when it counts words, but for spell checking your example string should be cut into pieces before sent to myspell.
Comment 7 michael.ruess 2003-06-17 15:46:41 UTC
Unfotunately this is the only way to handle e.g. the word count and the auto-capitalization after a full stop in a suitable manner. Thus the behaviour will not and cannot be changed at all. I think, implementing for the dictionaries, that the spellchecker will also recognize "z. B." from a user dictionary is very hard and VERY time-consuming. MRU->TL: please give your opinion about this.
Comment 8 maccy 2003-06-17 16:00:43 UTC
Actually it is impossible to add all abbreviations into the dictionaries because everyone uses his own abbreviations. OpenOffice is the only word processor which claims that x.y.z. is a spelling error. If the dot (not followed by space) can't be configured as a word separator then there has to be something like a preprocessing of "words" before they are being sent to the spell checker.
Comment 9 thomas.lange 2003-06-18 09:34:50 UTC
No definetly the "." should not be a word seperator. If that would be the case it will be no longer possible to spellcheck abbreviations for example like "i.e." or "Dr.". Those abbreviations need to get passed to the spellchecker as a single text in order to allow the spellchecker to verify them. On the other hand this of course requires to allow for a dot at the and of any word since the word might be located at the end of a sentence. I think this is a minor drawback since not checking abbreviations at all is far less acceptable. Kevin is absolutely right with his opinion that the dictionary authors have to include the abbreviatinsd to their dictionaries. Though I see the point with the half space between "z. B." there is still one main reason for using "z.B." : - In SO we have two different third party spellcheckers included. All of them obtained from vendors specialized in the field of spellchecking. Unfortunately both of them use ISO-8859-1 or similar encoding and thus do not know about the half space. And presenting them with the choice of "z.B." and "z. B." only the first one is the accepted. Given all this we will stick to the current behaviour. About: > Actually it is impossible to add all abbreviations into > the dictionaries because everyone uses his own abbreviations. Correct. And thus someone using personalized abbreviations has to take the consequences: the word not being known. > OpenOffice is the only word processor which claims that > x.y.z. is a spelling error. This of course could be changed to words containing a "." within never being checked at all. But this would also prevent the detedction of spelling errors in known abbreviations. Since I also like to use personalized abbreviations e.g. "WE" for weekend which could be excluded from spellchecking by disabling the option "Check uppercase words" (which is the default), I propose to introduce a new option "Check abbreviations" which defaults to true and specifies if words containing a "." within should be checked or not. I'll pass this on to user experience for decision abput such an option or finding other solutions. TL->BH: Please take over Since the decision about such an option can be done quite easily I change the target to OOo 2.0.
Comment 10 maccy 2003-08-31 19:52:30 UTC
Thomas, you say that you like it if the abbreviations are also being spellchecked but there a few major points I want to emphasize: - Abbreviations written without space inbetween are incorrect, thus dictionary authors would have to add incorrect "words" into the dictionary. - Even if some authors might consider to add wrong abbreviations into their dictionaries, the number of them is too high and every special vocabulary (like the legal stuff, the military stuff etc.) have other special abbreviations. - The behaviour of OO.o is very uncommon and confuses the majority of people working with OO.o, especially the ones who are evaluating OO.o and think about switching from MS Office.
Comment 11 falko.tesch 2003-09-01 09:47:32 UTC
This issue is re-targeted to "Office later" Reason: As already stated our German spell-checker is not capable of Unicode. Furthermore for those in need of this feature they will use the AutoCorrection (since it is way to awkward to select a em-dash from "Special Characters" everytime). And finally IMHO Word doesn't do it any better than us anyway.
Comment 12 bsb 2004-07-12 12:41:08 UTC
To me personally, the period should be a word separator. I noticed that from time to time I fail to press the space bar hard enough to insert the space after a period. So I end up with words like "end.Beginning" which are treated as one word. The most annoying thing is that I can't use Ctrl+<Left/Right Arrow> to position the cursor right after the period and insert a space. The second, quite less annoying result of the period not being a separator, is that this "word" is entered in the word completion list with the '.' in the middle.
Comment 13 bettina.haberer 2010-05-21 14:46:05 UTC
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".