Issue 89042

Summary: Word count is incorrect with certain special characters in the text (i.e. custom quotes, dashes)
Product: Writer Reporter: foobard <lists>
Component: codeAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P3 CC: eric.savary, frank.meies, f_dawgy_dogg, issues, Mathias_Bauer, obstruction, stefan.baltzer
Version: OOo 2.4.0   
Target Milestone: 3.4.0   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Bugdoc with different quotation marks that irritate word count none

Description foobard 2008-05-05 21:24:59 UTC
Here's the test case:

(a ‘salvage function’)

counts as three words in MS Word, but four words in the lastest OOo beta.

however, if you remove the word 'a', it is counted by both programs as only two
words.
Comment 1 michael.ruess 2008-05-06 07:44:44 UTC
Reassigned to SBA.
Comment 2 stefan.baltzer 2008-07-15 19:03:49 UTC
Confirming issue with DEV300_m24.
But it is not the "nested special characters". When I remove the brackets,
nothing changes. I will attach a bugdoc with some examples. It looks like the
custom quotation marks are the key to OOos mis-count. Adjusting summary.
Reassigned to FME. Put khong on cc.
Comment 3 stefan.baltzer 2008-07-15 19:07:50 UTC
Created attachment 55130 [details]
Bugdoc with different quotation marks that irritate word count
Comment 4 bartmoss 2008-11-19 07:01:19 UTC
Bug still exists in 3.0 (OOO300m9 build 9358). Fairly annoying bug for NaNoWriMo
participants. It is definitely the custom quotes; replacing them in a document
corrects the word count.
Comment 5 dridgway 2008-12-30 04:48:08 UTC
*** Issue 97116 has been marked as a duplicate of this issue. ***
Comment 6 Rainer Bielefeld 2009-05-03 17:03:12 UTC
*** Issue 100629 has been marked as a duplicate of this issue. ***
Comment 7 eric.savary 2009-05-26 22:13:49 UTC
*** Issue 102270 has been marked as a duplicate of this issue. ***
Comment 8 eric.savary 2009-11-26 23:21:22 UTC
*** Issue 107241 has been marked as a duplicate of this issue. ***
Comment 9 leapetra 2009-11-27 06:56:43 UTC
This is a really annoying problem.  To replace all the quotes in a file is not
really a solution.  Especially if it is a 300+ page document.
Comment 10 stefan.baltzer 2009-11-27 11:58:52 UTC
Reassigned to TL.
Comment 11 stefan.baltzer 2009-11-27 15:51:52 UTC
*** Issue 99131 has been marked as a duplicate of this issue. ***
Comment 12 stefan.baltzer 2009-11-27 16:02:12 UTC
Counting non-characters as words must be solved "all at once". 
Keeping an issue for each miscounting symbol makes not much sense. 

Mentioning dashes in summary from duplicate of duplicate issue. 

Exemple:
Sed lacinia arcu non diam sodales porttitor.   [word count: 7]
- Sed lacinia arcu non diam sodales porttitor.  [word count: 8]
- - - Sed lacinia arcu non diam sodales porttitor. [word count: 10]
Comment 13 eric.savary 2010-01-05 14:28:26 UTC
*** Issue 108072 has been marked as a duplicate of this issue. ***
Comment 14 theandybarnes 2010-01-12 23:08:06 UTC
Similar problem found in v3.1.1, with the start quote on words in inverted commas.
These are the AutoCorrect -> Custom Quotes -> Single quotes -> Start quote ->
U+2018 quotes. No problem with the non-AutoCorrected quotes.

Example:

crimes   counts as 1 word;

crimes'   counts as 1 word;

'crimes'   counts as 2 words.

It doesn't seem to matter how many words are within the quotes, the first always
counts as an extra word.
Comment 15 lordhedgie 2010-01-24 20:15:44 UTC
I have 2.4.1 installed on Ubuntu 8.03.3 and it seems to work correctly.  I also
have 3.1.1 installed on Ubuntu 9.10 and opening quotes are counted as extra
words.  For example:

"Hello," said Bob.  "How are you?

Would be eight words.

I'm not a very good programmer, but willing to help with test cases and figuring
out behavior.
Comment 16 eric.savary 2010-06-10 09:04:05 UTC
*** Issue 112259 has been marked as a duplicate of this issue. ***
Comment 17 Joost Andrae 2010-06-11 14:14:53 UTC
Punctuation characters as well as custom quote characters in combination with a
non-breaking space should be handled differently. Please take this behavior in
consideration with a feature that has been implemented in DEV300m81 for several
French document locales.

See
http://wiki.services.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_%28espaces_ins%C3%A9cables%29
Comment 18 michael.ruess 2010-07-23 07:14:12 UTC
*** Issue 113375 has been marked as a duplicate of this issue. ***
Comment 19 parsim 2010-07-28 04:44:37 UTC
A similar case, not yet mentioned: when "Replace Dashes" is turned on
(AutoCorrect -> Options), OOo undercounts.

For example, the sentence:

Bob--as usual--disagreed.

... is correctly counted as 4 words with "Replace Dashes" turned off, but as
only 2 words with "Replace Dashes" on, when the two en dashes are replaced with
a single em dash.
Comment 20 wolfbaginski 2010-11-25 14:10:11 UTC
Still causing major errors in the Word Count. I've been able to check against 
other programs, and the discrepancies are at a rate of 1 in 30, compared to 
differences between other programs of around 1 in 1000 words. Custom quotes and 
dashes have very obvious effects, and I'm seeing it in Windows and Linux
Comment 21 thomas.lange 2010-12-08 13:36:36 UTC
.
Comment 22 thomas.lange 2010-12-08 13:47:05 UTC
Fixed in CWS tl84.

Fixed means: word count for text with typographical quotes (single and double
quote) as listed in the attached bugdoc now do behave similar to MS Word 2007
again. This means especially that
  French « savoir calculer »
is still counted as 5, since I was told the main 'feature' of the word count
implementation is to give the same result as MS Word.
Comment 23 thomas.lange 2010-12-10 09:36:23 UTC
TL->SBA: Please verify.
Comment 24 michael.ruess 2010-12-15 09:53:46 UTC
Correcting target (from 3.x to 3.4).
Comment 25 scottydm 2010-12-18 04:23:11 UTC
->mru:

So does your post mean that this bug fix will be rolled into v3.4?

I see v3.3 is going through release candidates and I assume the OO developers
don't want to complicate the release process with more bug fixes, but how about
v3.3.1 then?

Thanks!
Comment 26 eric.savary 2010-12-18 16:58:50 UTC
@scottydm: there is until now no 3.3.1 target and anyway this bug is not that
heavy that it should be fixed at a micro release.
3.4 is ok.
Comment 27 michael.ruess 2010-12-20 14:40:52 UTC
Verified in CWS tl84.
Comment 28 eric.savary 2011-03-01 20:35:21 UTC
*** Issue 117160 has been marked as a duplicate of this issue. ***