Issue 100629 - Inaccurate word count
Summary: Inaccurate word count
Status: CLOSED DUPLICATE of issue 89042
Alias: None
Product: Writer
Classification: Application
Component: ui (show other issues)
Version: OOO300m9
Hardware: Unknown All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: stefan.baltzer
QA Contact: issues@sw
Keywords: oooqa
Depends on:
Reported: 2009-03-28 17:11 UTC by mjerryfuerst
Modified: 2009-05-03 17:03 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description mjerryfuerst 2009-03-28 17:11:52 UTC
Open Office over counts the number of words in a document, typically by 30%.    
I wrote a letter of about 200 words.   MS Office 97 accurately reported 198
words, Open Office 3 reported 255 words.  The same problem existed in Open Office 2
Comment 1 Rainer Bielefeld 2009-03-28 18:17:23 UTC
I checked with "Ooo 3.0.1 (DE) Multilingual version GERMAN UI WIN XP: [OOO300m15
(Build 9379)]" and can NOT confirm the reported effect.

My quick test:

1. Counted words in a standard Text from a letter document.
2. Replaced all blanks by 'CR'
3. Copied new text column into a spreadsheet
4. Deleted empty lines from paragraph breaks
5. compared word count with No. of Rows in the spreadsheet: was exactly 
   the same (229 words).

Same result with "Ooo Dev 3.2.0 multilingual version English UI WIN XP:
[DEV300m44 (Build 9395)]".

Might be related to  Issue 89042?

Please attach a sample document!
Comment 2 eric.savary 2009-04-01 15:26:27 UTC
Reassigned to SBA
Comment 3 scottydm 2009-04-13 08:28:01 UTC
I too see inaccurate word counts in OO Write. I'm using American English v3.01
for Windows.

The left-leaning double quote and left-leaning single quote counts as a single
word, even when it's snugged up against a word (no space). E.g. “Mia = 2 words
and ‘fox = 2 words. Both of these examples should = 1 word.

Some other punctuation marks show up as alpha-numeric characters for the purpose
of word counts. E.g. ### = 1 word and # # # = 3 words. I didn't check all
possible characters. Both these examples should = 0 words.

Hyphenated words count as one word, even when using a hard hyphen. E.g.
blue-green = 1 word. This example should = 2 words.

Note, I haven't checked, but soft hyphens should join words to make a count of
one. Hard hyphens should not.


In my opinion only strings alpha-numeric characters should contribute to word
counts, never any punctuation marks. For purposes of finding the boundaries of
these strings hard hyphens should be treated as a space, soft hyphens should
not. Neither should apostrophes, or left or right-leaning single quotes (which
might be treated as apostrophes). E.g. can't, can’t, and even can‘t should = 1 word.

The hard hyphen rule will mess up phone numbers, but phone numbers should be
less common in blocks of text than hyphenated words. One possible fix is to
treat hard hyphens within strings of numbers differently than hard hyphens
within strings of alphas. E.g. blue-green = 2 words, 555-1212 = 1 word.

This will still mess up people who use hard hyphens where they should be using
soft hyphens, but then people should learn to use the tools properly.
Comment 4 scottydm 2009-04-13 08:34:18 UTC
This looks like a duplicate of issue 89042.

OO Write used to not do this, something changed in the past couple of years.
Comment 5 Rainer Bielefeld 2009-05-03 17:03:13 UTC
No further information, so DUP

*** This issue has been marked as a duplicate of 89042 ***
Comment 6 Rainer Bielefeld 2009-05-03 17:03:38 UTC