Apache OpenOffice (AOO) Bugzilla – Issue 102135
Document the rules used to count words in a document
Last modified: 2013-08-07 14:38:26 UTC
I know this has been discussed before by user foobard, but on discovering this problem this morning for the first time, as a professional writer, I'm very upset that OpenOffice.org 3.0.1 counts words differently than MS Word, and the radical difference doesn't seem to have been fixed at least as of OOo 3.0.1. Perhaps this problem has been fixed with the current release, but I haven't installed the current release yet because, near the end of a project, I'm wary of doing any tinkering with my computer. I wrote a rather long, several hundred page project, with a set word limit. Luckily, OOo word count is lower than Word, or I'd be in a real panic this morning. But as it is, OOo's word count is about 10,000 words less than Word's word count. Until this morning, I never even thought that OOo would count words differently than Word. As a non-programmer, I just figured a word was a word was a word. As a professional writer, having an accurate word count is important. If OOo counts words differently than MS Word, that's fine with me. But I'd like to know the exact criteria that OOo uses to count words, so that I can at least tell my editor the criteria that has been used to count words. I cannot find any documentation anywhere on the OOo website that describes the criteria used to count words in writer. For me and other professional writers, having an accurate word count that matches as closely as possible the industry standard (which unfortunately is MS Word) is very important and is a P1 issue. At the very least, we need to know what standard is used specifically, in non-technical terms, so that we can explain it to an editor. Otherwise, this issue is an app killer for us and I won't be able to use OOo again even though I love it so much more than Word. I would attach files as examples, but I am not allowed. As an example, I have discovered that Word 2007 counts things like 975-976 as one word, while OOo 3.0.1 counts it as two words. Furthermore, I am using endnotes, page numbering, and the Chicago Style for editing and formatting stuff.
As you said, the difference in word counts between Word and OOo exists and has already been report. Thus, no need to report it once more. But your report brings the idea of documenting how words are counted. There are multiple criteria for considering a word is a word, especially with all particularities languages may show (with ', -, /... 1 or 2 words...?). So, yes there's a real need in documenting OOo's counting rules. At least as long as we have differences with MS-Word.
@es. Thanks for the quick come back. As a professional freelance writer, I need easily accessible documentation for the counting rules used. Moreover, I think it would be a good idea to clearly note somewhere as a disclaimer that there is a difference between OOo Writer and MS Word in counting. For professional freelance writers, it's a big deal.
seahunter, Are you sure you're using 3.0.1? There was a major improvement to word count in OOo (mostly because of my nagging) at the 3.0 marker. I have 50,000+ word documents that count *identically* in MS Word and in OpenOffice. Make sure you have the most recent version installed, then report back. foobard
@foobard, Yup, I am 100% sure that I am using 3.0.1. Clicking "Help" and "About" gives me: OpenOffice.org 3.0.1 OOO300m15 (Build: 9379) I will correct one error in my original report below and I sincerely apologize for the error and hope everyone will understand it is because I was thrown into panic. I'm on a tight deadline and this project has consumed several years of my life. I wrote that the difference was 10,000 words. In my utter panic this morning, I hit one too many zeros, so I'll give a more accurate count of the difference: MS Word: 93,547 OO Writer: 95,097 Difference: 1,550 Yes, it is much different from the 10,000 word difference I noted below and for that error. But when one has contractual word limit it can be a problem. It can mean the difference between meeting a contractual obligation or not. Again, sorry for, ironically, the typo and I hope it won't diminish the initiative to continue looking into this problem with OOo's word counting. I realize how important filing an accurate bug report is.
seahunter, when you have a moment (perhaps after you've met your deadline), could you give us specific examples of words that count differently? For example, in your original bug report, you give the example of "975-976", which you say counts as two words in OpenOffice 3.0.1. It used to count as two words in pre-3.0 versions of OOo, but in my copy of OOo 3.0.1, it counts as two words. (I checked it just now.) Giving us specific examples like this that we can reproduce will narrow down the problem. I can't fix them for you (I'm not a developer), but I can at least reproduce and confirm the problem so that others can. foobard
oops. To clarify, in OOo 3.0.1 the test case "975-976" counts as only *one* word. apologies for the confusion.
This is interesting as when I type in 975-976 OOo 3.0.1 counts it as *two* words for me. This is a clean install of OOo 3.0.1 running on my machine as I had to replace my hard drive a few weeks ago. I had the Language Tool extension installed also, but I just uninstalled it, restarted, and it makes no difference. The Word count for the test case 975-976 still comes up as *two* words for me in OOo 3.0.1. When I get my project finished up, I'll see what I can come up with in terms of other examples. Then I'll also completely uninstall OOo 3.0.1 and install 3.1 and see if that makes a difference. I'll also see if there's a difference on my Ubuntu machine, which I don't use for the project editing at all. If I had to guess what is causing the differences, I'd say it is some stuff, in the Chicago Style, in my endnotes. That's where I'll start my investigation.
In both OOo 3.1.0 in Windows and OOo 3.0.1 on Ubuntu, the test case "975-976" is correctly counted as just one word. I cannot speculate why you are getting different results. There are known issues (see bugs 14410, 86537) with word count in OOo footnotes and endnotes. If you can come up with a sample document or a reproducible bug that is different from the two listed above, you may like to open a new bug that directly specifies that problem.
Created attachment 62463 [details] OOo Template that Overrides Word Counting Standards
Okay the attachment is a OOo template that I made and use myself and have for years. It just sets the page number, some other minor formatting items, the way I like it. I discovered something. I brought up OOo 3.1 in Ubuntu and typed 945-946 and again the word count was *two* words. I reset the template to OOo's original with nothing on the page, etc., typed 945-946 again and the word count read *one* word. I made this template several years ago and whenever I install a new OOo update, I just import the template and get on with life. There must be something in the template coding that interferes with how OOo counts words.
I can now confirm that OOo 3.1 is counting 945-946 as one word, even in my template, so the template is not the problem. This however goes back again to not knowing what standards are being used in OOo. I realize now that OOo is counting the inserted page number as a word. But without having a list of criteria to go by, I had know way of knowing this even if it might seem logical to include it in retrospect. But while OOo is counting 945-946 as one word, it still doesn't explain the difference in word count between MS Word and OOo that I'm still getting. I would like to continue to investigate, but unless we can get a full explanation of what is being counted as a word in OOo, it could just lead to continued confusion as was the case with the 945-946 scenario. Not knowing the standard used, I became confused -- again -- and didn't realize that I should also know that OOo was counting the inserted page number as a word along with 945-946.
One last test. My OOo counts a page number as a word, but Word 2007 does not count a page number as a word. Over a 300 page document, that's 300 words.
seahunter, the goal (as far as I'm concerned) is for OpenOffice to count words exactly the same as MS Word. The way to make this happen is to specify each way in which OpenOffice word count varies from MS Word, then open a bug that narrows in on that specific bug. Pushing the iniative to the OpenOffice "cloud" by requesting documentation is a sure way of making sure these fixes never happen. I suggest creating a new bug re: the page number thing, for starters.
see bug 102169.
@UFI: Can you find a way to document this? If not please reassign to requirements.
You might know issue 17964 and issue 27302 and the spec doc http://specs.openoffice.org/writer/wordcount/Enhanced_Wordcount.sxw Nowhere the definition of a "word" can be found. So nothing can be documented at this time. May I suggest to create a new FAQ page at http://wiki.services.openoffice.org/wiki/Documentation/FAQ/Writer where all users can post their findings?
Setting SBA as CC.