Apache OpenOffice (AOO) Bugzilla – Issue 100629
Inaccurate word count
Last modified: 2009-05-03 17:03:38 UTC
Open Office over counts the number of words in a document, typically by 30%. I wrote a letter of about 200 words. MS Office 97 accurately reported 198 words, Open Office 3 reported 255 words. The same problem existed in Open Office 2
I checked with "Ooo 3.0.1 (DE) Multilingual version GERMAN UI WIN XP: [OOO300m15 (Build 9379)]" and can NOT confirm the reported effect. My quick test: 1. Counted words in a standard Text from a letter document. 2. Replaced all blanks by 'CR' 3. Copied new text column into a spreadsheet 4. Deleted empty lines from paragraph breaks 5. compared word count with No. of Rows in the spreadsheet: was exactly the same (229 words). Same result with "Ooo Dev 3.2.0 multilingual version English UI WIN XP: [DEV300m44 (Build 9395)]". Might be related to Issue 89042? @mjerryfuerst: Please attach a sample document!
Reassigned to SBA
I too see inaccurate word counts in OO Write. I'm using American English v3.01 for Windows. The left-leaning double quote and left-leaning single quote counts as a single word, even when it's snugged up against a word (no space). E.g. “Mia = 2 words and ‘fox = 2 words. Both of these examples should = 1 word. Some other punctuation marks show up as alpha-numeric characters for the purpose of word counts. E.g. ### = 1 word and # # # = 3 words. I didn't check all possible characters. Both these examples should = 0 words. Hyphenated words count as one word, even when using a hard hyphen. E.g. blue-green = 1 word. This example should = 2 words. Note, I haven't checked, but soft hyphens should join words to make a count of one. Hard hyphens should not. ---- In my opinion only strings alpha-numeric characters should contribute to word counts, never any punctuation marks. For purposes of finding the boundaries of these strings hard hyphens should be treated as a space, soft hyphens should not. Neither should apostrophes, or left or right-leaning single quotes (which might be treated as apostrophes). E.g. can't, can’t, and even can‘t should = 1 word. The hard hyphen rule will mess up phone numbers, but phone numbers should be less common in blocks of text than hyphenated words. One possible fix is to treat hard hyphens within strings of numbers differently than hard hyphens within strings of alphas. E.g. blue-green = 2 words, 555-1212 = 1 word. This will still mess up people who use hard hyphens where they should be using soft hyphens, but then people should learn to use the tools properly.
This looks like a duplicate of issue 89042. OO Write used to not do this, something changed in the past couple of years.
No further information, so DUP *** This issue has been marked as a duplicate of 89042 ***
.