Issue 102135 - Document the rules used to count words in a document
Summary: Document the rules used to count words in a document
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: editing (show other issues)
Version: OOo 3.0.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-22 12:36 UTC by seahunter
Modified: 2013-08-07 14:38 UTC (History)
5 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
OOo Template that Overrides Word Counting Standards (7.68 KB, text/plain)
2009-05-23 13:15 UTC, seahunter
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description seahunter 2009-05-22 12:36:04 UTC
I know this has been discussed before by user foobard, but on discovering this
problem this morning for the first time, as a professional writer, I'm very
upset that OpenOffice.org 3.0.1 counts words differently than MS Word, and the
radical difference doesn't seem to have been fixed at least as of OOo 3.0.1.
Perhaps this problem has been fixed with the current release, but I haven't
installed the current release yet because, near the end of a project, I'm wary
of doing any tinkering with my computer.

I wrote a rather long, several hundred page project, with a set word limit.
Luckily, OOo word count is lower than Word, or I'd be in a real panic this
morning. But as it is, OOo's word count is about 10,000 words less than Word's
word count. Until this morning, I never even thought that OOo would count words
differently than Word. As a non-programmer, I just figured a word was a word was
a word.

As a professional writer, having an accurate word count is important. If OOo
counts words differently than MS Word, that's fine with me. But I'd like to know
the exact criteria that OOo uses to count words, so that I can at least tell my
editor the criteria that has been used to count words. I cannot find any
documentation anywhere on the OOo website that describes the criteria used to
count words in writer.

For me and other professional writers, having an accurate word count that
matches as closely as possible the industry standard (which unfortunately is MS
Word) is very important and is a P1 issue. At the very least, we need to know
what standard is used specifically, in non-technical terms, so that we can
explain it to an editor. Otherwise, this issue is an app killer for us and I
won't be able to use OOo again even though I love it so much more than Word.

I would attach files as examples, but I am not allowed. As an example, I have
discovered that Word 2007 counts things like 975-976 as one word, while OOo
3.0.1 counts it as two words. Furthermore, I am using endnotes, page numbering,
and the Chicago Style for editing and formatting stuff.
Comment 1 eric.savary 2009-05-22 12:48:57 UTC
As you said, the difference in word counts between Word and OOo exists and has
already been report.
Thus, no need to report it once more.

But your report brings the idea of documenting how words are counted.
There are multiple criteria for considering a word is a word, especially with
all particularities languages may show (with ', -, /... 1 or 2 words...?).

So, yes there's a real need in documenting OOo's counting rules. At least as
long as we have differences with MS-Word.
Comment 2 seahunter 2009-05-22 14:04:08 UTC
@es. Thanks for the quick come back. As a professional freelance writer, I need
easily accessible documentation for the counting rules used. Moreover, I think
it would be a good idea to clearly note somewhere as a disclaimer that there is
a difference between OOo Writer and MS Word in counting. For professional
freelance writers, it's a big deal.
Comment 3 foobard 2009-05-22 17:06:44 UTC
seahunter,

Are you sure you're using 3.0.1?

There was a major improvement to word count in OOo (mostly because of my
nagging) at the 3.0 marker. I have 50,000+ word documents that count
*identically* in MS Word and in OpenOffice.

Make sure you have the most recent version installed, then report back.

foobard
Comment 4 seahunter 2009-05-22 18:16:08 UTC
@foobard,

Yup, I am 100% sure that I am using 3.0.1. Clicking "Help" and "About" gives me:

OpenOffice.org 3.0.1
OOO300m15 (Build: 9379)

I will correct one error in my original report below and I sincerely apologize
for the error and hope everyone will understand it is because I was thrown into
panic. I'm on a tight deadline and this project has consumed several years of my
life.

I wrote that the difference was 10,000 words. In my utter panic this morning, I
hit one too many zeros, so I'll give a more accurate count of the difference:

MS Word: 93,547
OO Writer: 95,097

Difference: 1,550

Yes, it is much different from the 10,000 word difference I noted below and for
that error. But when one has contractual word limit it can be a problem. It can
mean the difference between meeting a contractual obligation or not.

Again, sorry for, ironically, the typo and I hope it won't diminish the
initiative to continue looking into this problem with OOo's word counting. I
realize how important filing an accurate bug report is.
Comment 5 foobard 2009-05-22 18:42:00 UTC
seahunter,

when you have a moment (perhaps after you've met your deadline), could you give
us specific examples of words that count differently?

For example, in your original bug report, you give the example of "975-976",
which you say counts as two words in OpenOffice 3.0.1.

It used to count as two words in pre-3.0 versions of OOo, but in my copy of OOo
3.0.1, it counts as two words. (I checked it just now.)

Giving us specific examples like this that we can reproduce will narrow down the
problem. I can't fix them for you (I'm not a developer), but I can at least
reproduce and confirm the problem so that others can.

foobard
Comment 6 foobard 2009-05-22 18:43:32 UTC
oops.

To clarify, in OOo 3.0.1 the test case "975-976" counts as only *one* word.

apologies for the confusion.
Comment 7 seahunter 2009-05-22 19:12:16 UTC
This is interesting as when I type in 975-976 OOo 3.0.1 counts it as *two* words
for me.

This is a clean install of OOo 3.0.1 running on my machine as I had to replace
my hard drive a few weeks ago.

I had the Language Tool extension installed also, but I just uninstalled it,
restarted, and it makes no difference. The Word count for the test case 975-976
still comes up as *two* words for me in OOo 3.0.1.

When I get my project finished up, I'll see what I can come up with in terms of
other examples. Then I'll also completely uninstall OOo 3.0.1 and install 3.1
and see if that makes a difference. I'll also see if there's a difference on my
Ubuntu machine, which I don't use for the project editing at all.

If I had to guess what is causing the differences, I'd say it is some stuff, in
the Chicago Style, in my endnotes. That's where I'll start my investigation.
Comment 8 foobard 2009-05-23 03:34:34 UTC
In both OOo 3.1.0 in Windows and OOo 3.0.1 on Ubuntu, the test case "975-976" is
correctly counted as just one word. I cannot speculate why you are getting
different results.

There are known issues (see bugs 14410, 86537) with word count in OOo footnotes
and endnotes. If you can come up with a sample document or a reproducible bug
that is different from the two listed above, you may like to open a new bug that
directly specifies that problem.
Comment 9 seahunter 2009-05-23 13:15:48 UTC
Created attachment 62463 [details]
OOo Template that Overrides Word Counting Standards
Comment 10 seahunter 2009-05-23 13:18:53 UTC
Okay the attachment is a OOo template that I made and use myself and have for
years. It just sets the page number, some other minor formatting items, the way
I like it.

I discovered something. I brought up OOo 3.1 in Ubuntu and typed 945-946 and
again the word count was *two* words. I reset the template to OOo's original
with nothing on the page, etc., typed 945-946 again and the word count read
*one* word.

I made this template several years ago and whenever I install a new OOo update,
I just import the template and get on with life. There must be something in the
template coding that interferes with how OOo counts words.

Comment 11 seahunter 2009-05-23 13:45:33 UTC
I can now confirm that OOo 3.1 is counting 945-946 as one word, even in my
template, so the template is not the problem.

This however goes back again to not knowing what standards are being used in
OOo. I realize now that OOo is counting the inserted page number as a word. But
without having a list of criteria to go by, I had know way of knowing this even
if it might seem logical to include it in retrospect. 

But while OOo is counting 945-946 as one word, it still doesn't explain the
difference in word count between MS Word and OOo that I'm still getting.

I would like to continue to investigate, but unless we can get a full
explanation of what is being counted as a word in OOo, it could just lead to
continued confusion as was the case with the 945-946 scenario. Not knowing the
standard used, I became confused -- again -- and didn't realize that I should
also know that OOo was counting the inserted page number as a word along with
945-946.
Comment 12 seahunter 2009-05-23 14:33:45 UTC
One last test. My OOo counts a page number as a word, but Word 2007 does not
count a page number as a word. Over a 300 page document, that's 300 words.
Comment 13 foobard 2009-05-23 16:33:04 UTC
seahunter,

the goal (as far as I'm concerned) is for OpenOffice to count words exactly the
same as MS Word.

The way to make this happen is to specify each way in which OpenOffice word
count varies from MS Word, then open a bug that narrows in on that specific bug.

Pushing the iniative to the OpenOffice "cloud" by requesting documentation is a
sure way of making sure these fixes never happen.

I suggest creating a new bug re: the page number thing, for starters.
Comment 14 foobard 2009-05-23 18:08:22 UTC
see bug 102169.
Comment 15 eric.savary 2009-05-24 09:29:07 UTC
@UFI: Can you find a way to document this? If not please reassign to requirements.
Comment 16 Uwe Fischer 2009-05-25 12:28:37 UTC
You might know issue 17964 and issue 27302 and the spec doc
http://specs.openoffice.org/writer/wordcount/Enhanced_Wordcount.sxw
Nowhere the definition of a "word" can be found. So nothing can be documented at
this time.
May I suggest to create a new FAQ page at
http://wiki.services.openoffice.org/wiki/Documentation/FAQ/Writer where all
users can post their findings?
Comment 17 eric.savary 2009-05-25 14:26:26 UTC
Setting SBA as CC.