Apache OpenOffice (AOO) Bugzilla – Issue 17171
Paragraph cannot be longer than 65534 characters
Last modified: 2017-05-20 11:01:04 UTC
Hi! This file is not fully converted, there are a lot of pages missed. Hope you will fix it. Thanks in advance Best Regards DIEGO URRA
Created attachment 7888 [details] file converted incorrectly and pages missing
The file to be converted contains a long paragraph that extends for over 100 pages. The import filter appears to be truncating the paragraph precisely at the 65535th (i.e. max unsigned short) character. Inserting a paragraph break a little before the truncation point prevents the problem from occuring for another 64K characters. A workaround is to use smaller paragraphs -- a new paragraph about every 12 pages should nicely prevent this problem from occuring under normal circumstances. In 1.1 RC, the data is no longer truncated, but a new paragraph is started for you after every 65534 characters, even if it splits a word. I will report this as another issue.
OOo 1.1 RC handling of this case has been reported as Issue 17329.
please update the summary to reflect the actual issue before confirming an issue. Changed Summary, added ms_interoperability, setting target-milestone original summary: "Bad conversion, pages missing and wrong behaviour with an easy file."
*** Issue 17329 has been marked as a duplicate of this issue. ***
from issue 17329: Importing an MS Word Doc with paragraph longer than 65535 characters causes a paragraph break to be inserted after every 65534 characters. Discovered while trying unsuccessfully to reproduce the OOo 1.0.3 Issue 17171 on OOo 1.1 RC.
Created attachment 8051 [details] simpler doc-file to reproduce the problem.
Reassigned to MIB
.
*** Issue 18417 has been marked as a duplicate of this issue. ***
*** Issue 19969 has been marked as a duplicate of this issue. ***
*** Issue 38603 has been marked as a duplicate of this issue. ***
*** Issue 39770 has been marked as a duplicate of this issue. ***
*** Issue 41049 has been marked as a duplicate of this issue. ***
*** Issue 42283 has been marked as a duplicate of this issue. ***
Created attachment 24725 [details] An example document where it is not possible to write anything more
*** Issue 50639 has been marked as a duplicate of this issue. ***
*** Issue 50764 has been marked as a duplicate of this issue. ***
*** Issue 53093 has been marked as a duplicate of this issue. ***
*** Issue 53473 has been marked as a duplicate of this issue. ***
*** Issue 5276 has been marked as a duplicate of this issue. ***
anyone working on this? seems this bug is known since 2003, is it likely to be fixed anytime soon?
No, it's no likely to be fixed soon. As you can see it's targeted to OOo Later and a Prio 4. Which is absolutely appropriate. Besides that it's not easyly fixable.
*** Issue 54720 has been marked as a duplicate of this issue. ***
*** Issue 55464 has been marked as a duplicate of this issue. ***
This is still a serious issue, as it causes loss of data which cannot be recovered (by 'undo' or other means, except entirely reloading the document, assuming you are fortunate enough to have noticed before you saved, overwriting the original). Please read <a href='http://www.openoffice.org/issues/show_bug.cgi?id=55464'>Issue 55464</a>. At the very least, Writer should not allow any actions that would cause this to happen, and present an error and explanation to the user. Simply 'losing' the text is unnaceptable.
*** Issue 54890 has been marked as a duplicate of this issue. ***
This bug have 3 YEARS!?! Some friends have problems with this bug, because some documents need to be written in only 1 paragraph and this is impossible with "OOo Writer". The problem is still greater when documents writer in "MS Office Word" are converted to "OOo Writer", because when exceeding 65534 characters, the text is eliminated! Please, don't forget this bug!
1st: If you think changing this is so easy, then provide a patch or pay someone to provide a patch 2nd: Tell me what percentage of users need such long paragraphs. 3rd: when importing, the text is not eliminated, but split into several paragraphs.
1st: This kind of attitude is extremely unhelpful for anything. Obviously, the majority of OO's end users are not capable of writing patches - that's what the developers do. Nobody suggested it was easy, but writing an office suite isn't easy either. Looking at some of the features which *are* being given priority, it makes more sense to me that something as fundamental as being able to handle text properly should be among them. Paying someone to provide a patch would defeat one of the major advantages of OO - it's free. If I have to pay money for a word processor not to lose my work, I'd rather spend it on purchasing a commercial package of higher quality. Also, the longer this problem is ignored, the more difficult it will become to change later. 2nd: I think this is rather irrelevant. What percentage of users need Obscure Feature X? A word processor should handle text, as much text as a user wants and in the way the user wants to format it it - everything else is second place. 3rd: However, any attempt to rejoin the split paragraphs results in immediate loss of data, without warning, and without any undo information being saved.
*** Issue 68340 has been marked as a duplicate of this issue. ***
Cloph, you have an e-mail @openoffice.org, so I think that you work on OpenOffice.Org. Unhappyly, OpenOffice.Org have so ignorant "employees" as you! One more data: ALL law-offices need write atas (I don´t know translate this word to english, but "ata" is a word from latim "ACTA" and it means a "write record about what was made in meeting") Understand me? This document ("acta") need writen in only one paragraph (it´s a rule!), and in most cases this paragraph need more than 65k characters, so is impossible use OpenOffice.Org. So, I think this information it answers your question 02. Best regards, Renato Yamane
*** Issue 69580 has been marked as a duplicate of this issue. ***
On OOo 2.0.4rc2 Linux, the statistics are wrong for the LongPara document. The statistics show 0 word and 0 characters. Also, it is easy to hang OpenOffice.org by using a combination of copy and paste, normal typing, and backspace to create a really long paragraph (starting from a blank document). If this should be a separate issue, let me know.
*** Issue 72234 has been marked as a duplicate of this issue. ***
*** Issue 74323 has been marked as a duplicate of this issue. ***
Dear developers, judging by number of duplicates the problem seems to be affecting a number of our customers. This issue is especially important in academic environment, where authors must follow the rules and simply can't insert paragraph breaks at whim of their wordprocessor. The problem is aggravated by the fact that Writer simply looses data (without Undo) if user attempts to remove extra paragraph break. (Needless to say that our competitors do not have this problem). Upping the priority because our current behavior can cause data loss. Please consider targeting for 2.3. Thanks a lot for your attention.
*** Issue 76313 has been marked as a duplicate of this issue. ***
US- and EU-gouvernments decided to use "Open Format" for all official documents including deeds (law-documents - "acta") and patents ? -> OO will have to cope with these documents or US- and EU-gouvernments will prescribe some other tools... Regards -Hans-
Another scenario where this can happen is doing certain global edits to improve formatting of a long document, which often require the temporary replacement of end of paragraph marks to create a single paragraph document, which is highly likely to exceed the limit. At present, I have to go back to Word in this situation. Plenty of concern about this issue, which as it involves data loss should arguably be P2 rather than P3. Target of OOo Later effectively is 'never'. Please consider changing target at worst to 3.0, preferably to 2.4.
*** Issue 82809 has been marked as a duplicate of this issue. ***
I have a doubt. Here we have any OOo developer reading this issue? This bug is very older (4 years)! I follow other older bug (http://www.openoffice.org/issues/show_bug.cgi?id=24969), with more than 3 YEARS, and when a OOo Developer found it, he fix the problem in only 3 DAYS! Regards, Renato
Can someone change Target Milestone? This bug is very important because can LOST data and I think that "OOo Later" target is very obscure (more 1 month? 1 year? 100 years?) Resume: All law-offices need write actas, and this documents is writen in only 1 paragraph (it is a rule), so when write more than 65534 characters, we LOST data! Regards, Renato
I understand that this issue is a killer for a few users. But as the defect is in some central code (the old tools String class) fixing it would require to change 90% of OOo's code base (wild guess). Even if we agreed on fixing the problem for Writer (and I'm still not doing this) all other developers are concerned as well. It would took several months in nearly all projects to do the change and fix all warnings and bugs that slip in while doing the change. In the meantime nothing else can be done on the code as the risk to run into merge problems is huge. This would be an effective standstill of development for several months. I still don't see that this is judged by the undeniable benefit for a small percentage of the user base. And especially we can't do it now, in the middle of a lot of work that has already started. So the target is still valid as this means that 3.0 is impossible.
I fully understand this problem cannot be solved soon. However I could imagine some intermediate options like: - a solution to prevent crashes - a solution to prevent data losses - a solution to warn users Maybe these intermediate options are helpful or sufficient for 90% of the users. Regards -Hans-
Sure, I'm happy with doing that. Unfortunately this issue basically is not about fixing the potential data loss and so we have to work on that outside of it. If someone volunteered to dig out the the crashes and data losses and moved them to one or more separate issues we could try to fix them.
Was waiting to see if someone more experienced would volunteer... I'm happy to have a go at creating issues for the interim options over the next two weeks. Comments on them would be appreciated in due course.
Excellent! So in case we have reproducable scenarios where either OOo crashes or document content gets lost unnoticed we should be able prevent the disaster and I will make sure that this will be fixed as soon as possible. I'm sorry to disappoint users wanting to work with huge paragraphs, but I can't help.
That is very easy to prove that contents get lost unoticed. On the attached document (LongParagraphIllness.odt), go to page 15 (or last page) and remove the paragraph ending, joining it with the following. The second paragraph (or part of it) is lost with no further notice.
Created attachment 49478 [details] very long paragraph sample
I've filed Issue 83427 (http://www.openoffice.org/issues/show_bug.cgi?id=83427), which addresses the data loss issue.
I just want to suggest that when this is fixed, the limit be changed from 2^16 to 2^64 (not 2^32). Planning ahead is always good...
Why only 64Bits and not 128? ;-) OK, I hope you see what I mean: larger is not necessarily always better, you must stop somewhere. And where to stop can be judged only by reasoning. As our API to work on paragraphs make is necessary that the whole paragraph text fits into one String variable it follows that the maximum length of a String must match the maximum length of a paragraph. Even if I consider changing the length of our UNO API String from 32Bit to 64Bit I don't see a clear benefit of doing so (and the effort and pain doing this change would be huge!). I have my doubts that using 64Bit integers for string length and indices is a good idea as it will influence performance. Handling 64Bit variables on 32Bit computers will result in a considerable slowdown. And what for? 32Bit will give us paragraphs with 4,2 billion characters. Saving this uncompressed will result in a file size of more than 8GB (2 Bytes per character) - only for one paragraph! I don't want to sacrifice performance for the ability to have paragraphs larger than 4 billion characters what very probably never would be needed.
An upper limit for paragraph lengths must always be considered as a second best choice. Issue 83427 reports Abiword does not have this 64k-limit-problem in any form when working with ODT files. Is it possible for OOo-tools to adopt the same technology?
I don't think 2^64 is necessary. But I think it will be reasonable by the time this bug is fixed, since already almost all new processors are 64-bit.
jwr: the 16Bit limit IMHO is just a bug; in OOo's API Strings have 32Bit length, just Writer's internal implementation (that in its beginning dates back to 1992 or so) has this limitation. Removing it is desirable but quite some work to do. This is different for the 64Bit case. Even I we assumed that in a few years most computers will use 64Bit machines (I doubt that - at least for everything ourside of the "western" world) the other drawbacks like loss of UNO API compatibility still remain.
Just to get an idea what 32-bit or 64-bit size paragraphs can give you: "32-bit" size strings/paragraphs amount to about 16 million characters. The Encyclopaedia Britannica has 32 volumes and 44 million words, amounting to about 1.4 million words per volume. If we assume that each word is about 10 letters (actually it is less), then a volume of Britannica fits well into a 32-bit size string (about 14 million letters) and we have over 2 million characters to spare. Therefore, a 32-bit size string can fit in a volume of Encyclopaedia Britannica. Considering that the OOo API strings are 32-bit, it would make sense to go for 32-bit.
That sounds reasonable, but what if a clever person opens all volumes of the EB into one document and deletes all paragraph markers? With computers getting more and more powerful, that's just the kind of thing you can expect to happen sooner or later.
floris_v, that's exactly my point. With 32-bit, it's possible to give examples of content that wouldn't fit. With 64-bit, it becomes absurd. Not only can 64-bit fit the entire EB, it can fit over a billion Wikipedias (all languages combined).
Guys, we have to put this in perspective. My point was that 32-bit is just enough for a new limit for the size of a single paragraph. I feel that 32-bit size for a single paragraph is fine. If we try to push for 64-bit, my very good guess is that this issue will get stalled and not resolved in the near future. Since the OOo API for strings uses 32-bit numbers for the size, it appears easier to get 32-bit paragraph sizes. In typesetting terms, it looks like a fringe case to demand from a desktop text editor to handle a single paragraph that consists of more than a volume of the Encyclopaedia Britannica. It looks like a better strategy to focus on getting 32-bit paragraph sizes, because the next option (still 16-bit size but just complain when going over the limit) may not be ideal for the examples shown above. If the string size is 32-bit and you use strings to represent paragraphs, then in order to go to 64-bit paragraph size you need to make too many changes in the program logic, check for regressions, etc, which does not appear to happen. It would make sense to go for 64-bit sizes when the basic data types of OOo are of 64-bit. This might happen in the next years, which would be a more appropriate time to ask for 64-bit paragraphs as well.
Frankly, this whole issue confuses me. Why do you want a fixed maximum size for a paragraph in the first place? Why do you want to store a single paragraph in a string? IMHO that whole concept is flawed. I have in the past removed all paragraph marks in Word documents, so that I got a document with one paragraph, of over 64 Kb, and no problem at all. I'm not sure anymore why I needed to do that, only that I did, and that it was very useful. Apparently the programmers of Word didn't need a max length for paragraphs, and if they didn't, then why do you?
Folks, can we move that discussion elsewhere? The poor developers that later on will work on that issue will have problems to find relevant information in this "chat". It's clear that 64KB for a paragraph is not enough and the OOo UNO API already uses 32Bit whereever text is retrieved from a text object (e.g. paragraphs, portions, selections etc.). This is not related to how a paragraph stores its text internally. This way how the text is stored in the implementation is not the problem of this issue. We have an *internal C++ API* that is limited to 16Bit and *this* must be fixed. And the problem is that this API is so widespread in OOo. The *external API* (UNO API) uses 32Bit and this will not be changed without a valid reason. The ability to store the whole Encyclopedia Britannica, the Wikepedia and the whole Google cache into a single paragraph is *not* a valid reason. So please try to add only comments that describe new, currently unknown situations where the 64KB limit bites users (situations that are not described in this issue or its duplicates). Thank you.
Here is another data-loss issue - 59185.
Thanks for the hint; as issue 59185 looks a bit more complicated to fix I keep the "3.x" target for now but we will reinvestigate the effort to see if we can switch the target to 3.0.
*** Issue 84902 has been marked as a duplicate of this issue. ***
*** Issue 85007 has been marked as a duplicate of this issue. ***
*** Issue 96176 has been marked as a duplicate of this issue. ***
I think that correct "Priority level" need be P2 and not P3. P2 is an issue with "data loss". And, I think that P2 priority need a target ASAP, not "OOo Later". "OOo later" mean "When someone take a look here" or "Never". Best regards, Renato
That depends. If the requirement is about extending the size of paragraphs, it's at best a P3. If the requirement is to avoid data loss in case a paragraph is changed to get over this boundary - well, yes, that would be a P2. If everybody is fine with separating this, we could create a second issue and make that a P2.
I filed a separate bug for the data loss issue (http://www.openoffice.org/issues/show_bug.cgi?id=83427) and linked it here over a year ago. Not sure what more you're asking, mba.
I just mentioned that if there was something that deserves a "P2" it would be the data loss part of the problem, not the inability to work with larger paragraphs. I just had forgotten that we already have been there and that the data loss has been fixed meanwhile. So as long as nobody proves the opposite: there is no data loss anymore and so P3 is enough. In case someone finds a situation where data loss caused by our paragraph limit still occurs: please do it like "superm401" and create a separate issue that can be fixed with an earlier target. Fixing the data loss should always be possible without extending the paragraph limit.
I don't agree with separate this bug in TWO. IMHO: 1) Change priority of this bug to P2; 2) Fix problem about data loss if developers can't fix problem about "longer paragraph"; 3) Change priority of this bug to P3; 4) Fix problem filed in 2003 (longer paragraph). This is not 2 bugs. This is only one bug: - If you write a longer paragraph, data can be lost. Why the hell we need open a new issue to: - Data can be lost if you write a longer paragraph? They are the *SAME*!! What is the root cause? Longer paragraph! If developers can't fix problem about "longer paragraph", so do an work-arround to avoid data lost. Best regards, Renato
Our QA won't accept a partial fix for this issue. So again: if somebody knows a case where the paragraph problem will create data loss, please create a separate issue so that we can fix it. Or stay with only one issues and wait for probably a longer time until we don't have more important problems than the inability to work with paragraphs that are longer than most of the documents our users create on average.
Still present in OOO310m11 (9399). Come on, it's 6 yr. old bug. Even Word 2 had no such restriction, and it was 16-bit application from 1992!
Hi! I started this bug back on Monday, Jul 21 2003. I still have hope someone will take REAL care of this. This is a real-world bug and it has a serious impact on users and in general word processing. I really want to stress this. It's not a theorical thing. Best Regards DIEGO URRA "All good things come to those who wait.", Ronin
Diego, IMHO I think that only way to fix this bug is buying a Microsoft Office Suite. This is the same in others areas that is necessary use Proprietary Softwares, like Autodesk Inventor, Solidworks, etc. Laws Office still need a Microsoft Office softwares[1]. OOo can´t be used on it. This is a critical bug to Laws Offices, and a 6 years old bug means "Live with this bug". [1] See comments: * Fri Aug 11 23:50:57 +0000 2006 * Fri Apr 20 08:40:04 +0000 2007
*** Issue 107382 has been marked as a duplicate of this issue. ***
I presume lawyers want to create documents that look like a single paragraph when printed and are not bothered about OOo's internal representation of the paragraph. For performance reasons it is better to keep paragraphs small, certainly much smaller than 60,000 characters. What about the following solution: Provide an additional paragraph style feature "merge with next paragraph". If a paragraph has this style, OOo is allowed to move the paragraph mark when repaginating / refreshing the document so that it always seems to merge into the next paragraph. When documents with large paragraphs are opened, they are automatically split into smaller paragraphs with this paragraph style feature. When saving in Microsoft Word format, the paragraphs can be merged back into a huge paragraph. Alternatively, without a new format: 1. On detecting a very long paragraph approaching the limit, Writer will automatically insert a paragraph break and warn the user that it has done so. 2. To provide an extension or plugin for 'virtual paragraphs'. This program scans through the text in the current section (or selection) and moves all paragraph marks to the end of the nearest line, to give the illusion (when printed) that the document consists of one paragraph. It will check that there is zero extra linefeed before or after the moved paragraph marks. The program will not alter double paragraph marks, or marks between paragraphs with different text styles.
@anoopshah Those are very imaginative approaches, but do they resolve the problem? The difficulty is that the suggestions, particularly the first, add unnecessary complexity and it's not obvious that the extra coding required would be less than doing the job properly. Writer is perfectly capable of handling a document with more than 65534 characters. Why can't a paragraph be paged in and out in an analogous way to a document? That might improve rather than detract from performance. The bottom line is that there are users who need very large paragraphs, whether because of the nature of the document or as an interim step in editing large documents. A 100 page book isn't very long in this context. Hard limits were common in early application software. Later, it was generally accepted that limits should be imposed only by what the hardware can handle. I understand that this may be a very difficult issue to fix within Writer, but it seems inevitable that the limit must be removed one day. A professional product would have this capability.
Hello everyone. (at first, sorry for my very bad English) Every week I need to write at least one "ata" - as renatoyamane said, this is the portuguese word from the latin "acta", that is the name of a document where people records the events of an official governmental meeting. Some of the developers and some of the people that follow this thread can think: "Why someone would need a paragraph with more than 65535 characters?"... My answer is: this is a necessity!!! And, unlike most of you can imagine, there are many people that have this needs!!! "Actas" (I didn't find the English word either for it, renatoyamane) need to be written in ONE PARAGRAPH, and as someone already said before, this is a RULE. And there are many REASONS to this rule, and one of them is that, as an official and very important document (usually written by governmental or law docs), it is necessary to avoid the possibility of later changes on the formatted text (possibly made by obscure and/or illegal/fraudulent intentions). I am not saying that we need 64 bit long characters... But 32 bit would be perfect!!! Why not using 32 bits????? And for those that think that this reason doesn't worth the developer's effort, my answer is: other word processors can handle big paragraphs since dinosaur's era, but unfortunately I need to go to M$ Word to handle my big "actas", and I am VERY SAD for having to do this!!! I AM STILL A FAN of OpenOffice, but you developers are forcing to me to go elsewhere, and this makes me MAD!! Come on, guys!!! A 7-years-old issue that is centered in a simple variable (a question of 16 bits to 32 bits migration on a specific feature) doesn't deserves your attention?? This is absurd, unacceptable!! (I know that there is an issue of "internal limitation", but why this limitations applies only to OO, and no other word-processor since "stone-era"? Again, sorry for my bad English. Hope everybody understand my disappointment. And hope we can write "actas" in Open Office the sooner the better.
Your post is a perfect example why users and developers often talk past each other. If this would be only a simple variable change we would have done that years ago. Unfortunately this is a huge effort as there is a lot of code that assumes that a paragraph or a text range length fits into 16 bit integer variables. You have to find and change all of them - that's quite a lot of work to do. So please refrain from such statements without actually having studied the code before. Of course huge effort alone never is a reason not to do something. But still there are other things we consider to be more important we can do in the same time frame. This can't be changed by telling us that this is "only a central variable to change" though we know that this just isn't true. Of course everybody is tempted to think that his/her requirements are the most important ones. But even if you know hundreds or thousands of other users with the same requirement, there may be hundreds or thousands of other users with other requirements that compete with yours for the available development resources.
piduca: We all want to have this issue fixed. It is very important to be civil and not appear condescending. I believe this was due to linguistic/cultural differences. mba: This issue looks like one that should be examined by the release team in order to figure out when to tackle (for which OOo release). Can you please give a pointer to the mailing list where we can ask to have this issue considered for a future release? In addition, is it possible to give some high level instructions for an ambitious developer who would try to give it a go and modify the code themselves? This developer should be able to compile OOo; is there a rough guide on which files need changing?
Come on folks, piduca took the trouble to download and install this, probably even spent time to register, only to find that it's no use for him. I can imagine he's frustrated. I can also imagine the pain of having to hunt down all references of paragraphs as 16 bit entities, as I recently migrated from Delphi to Lazarus, only to find that my old 16 bit integers were suddenly interpreted as 32 bit longint. That gave rise to an incredibly long list of compiler errors and warnings, and some real bugs with 16 bit data stored on disk interpreted as 32 bits, so I had some overflow there. Meaning to say: good luck with it! Maybe somebody should communicate this problem to the marketing department. It'd be nice if people would be warned about this before they download the package.
The number of files to change is huge. To give you (or the ambitious developer reading here) an impression: we use an object of class "String" to store the text of a paragraph. This class only supports 16 Bit length and indices. [We have another string class that supports 32 Bit, but it is a read-only string class (part of the UNO C++ runtime library).] We could change the String class to support 32 Bit length and indices. But this class is one of the most used classes in OOo, so we had to change code in nearly every library of OOo that does not use the mentioned 32 bit string class exclusively. An alternative: use an own String class just for paragraphs in Writer and see how far it goes. But what should happen if someone selects such paragraph and pastes it into Calc? And what about the code that Writer and Calc or Draw share and that also uses Strings (like e.g. the HTML filter)? Sooner or later we will end up with extending string lengths and indices to 32 Bit all over the place. So yes, we have to change the String class, there is no alternative. Thus we have to investigate all code that touches, passes or reads a string and look for usage of string length and indexing. Can you imagine how much code that is in an office suite? Technically, it would be necessary to change the String class, recompile the code and look for integer cast warnings (as now 32 Bit integers are handed over to calls or assignments that expect 16 Bit integers) and fix them by moving even more code to support 32 Bit integer parameters. It's like throwing stones into the water and follow the waves they create. And here we can only *hope* that nobody has used integer casts or even C style casts to convert arbitrary integer variables to unsigned 16 Bit integers, the type that is used for String length and indices, as then no warning would appear that can point us to a possible problem. So most probably we also had to scan the code for integer casts and C style casts to USHORT etc. and investigate this code. Can you imagine how much work this would be in several million lines of code? At least several months - if we don't do anything else. This is comparable to the switch from 8 Bit characters to 16 Bit characters that we made before we open sourced the code - so we have some data for comparison. And we had much less code 10 years ago... Is this issue worth stopping all other OOo development for several months? Don't you think that most users will think that we are nuts? But that's only engineering. As we can't be sure that we have found and converted all places, we had to make intensive testing with many documents containing huge paragraphs everywhere and apply all possible functions to them. IMHO we can invest our time in other areas with more benefit for the project. It's not impossible, but very, very much work. That's the reason why this issue has got the target "later". It's not the last word about it, but it's the status quo.
The tone of the discussion here has changed a little from 'this isn't something that is needed' to 'this is very difficult'. Thanks for that. Here are some observations: - OOO later can generally be interpreted as 'never' - on the other hand, as the previous change from 8 to 16 bits shows, this is something that will be needed eventually for a professional product - it seems intuitive that in high quality code, size should be set by a parameter, as even 32 bits is unlikely to be the final buffer size To what extent would it be practical to start preparing for the change, so that it was much less significant when eventually attempted? For example, would it be practical to slowly convert the code so that the current 16 bit implementation is based on a parameter? If it was, the amended class and methods could be introduced slowly by all developers as they enhance code for other reasons. It would also allow anyone who was particularly interested in making this change to start preparation for it within the standard code and it may well be that some automation of the changes might be possible. It would also allow experimental 32 bit buffer builds and testing. After some time, regrettably probably several years, the remaining conversion would be much less of a problem, though as mba clearly explained, it would not be insignificant. If nothing is done, the problem can only become harder and harder.
A simple parameter won't work here, what would help a lot however to prevent this kind of trouble in the future is a strict ban on explicit size typecasts. Maybe you should introduce a special StringSize type for any typecasts - it's a pity that you can't ban typecasts entirely, they're a big pain. Then when the size needs to be doubled again, all you have to do is redefine StringType and recompile. And then pray that the same thing doesn't pop up in extensions, that are entirely beyond the control of the developers.
From my POV this was never a "nobody needs that". Please don't mix the step from 8 bit characters to 16 bit characters with 16 bit string length vs. 32 bit string length. These are completely different topics and the necessity to do the first conversion is not related to the other one. Of course the String class uses a special type for length and indices. But there is other code that was written in the last 15 years that works with strings and in many places this explicit type is not used. I already thought about a possible way to reduce the effort over time. It's interesting that I had the same idea as you: if everyone who comes across code dealing with Strings has a short look and replaces all usage of unsigned 16 bit variables by the special type "xub_StrLen", we could reduce the effort over time. Sounds like something I could try to advertize amongst the developers.
Extensions already use 32 bit string length as our UNO API uses the mentioned other string class. So we will be safe with them until anybody wants to transport more than 4 GB of data with a single string variable.
> replaces all usage of unsigned 16 bit variables by the special type "xub_StrLen" Yes, this is the important first step. C++ could help with finding all these places if xub_StrLen was a class that did not provide an implicit conversion to an integral type anymore when a module has been fixed (similar to the gradual changes for warning code, where unchanged modules were marked with EXTERNAL_WARNING_NOT_ERRORS). The concept of strlen is so ambiguous that its use cases would benefit from some clarification. Using the helper class suggested above the different use cases could be explicitly differentiated by providing methods such as getStrBufferSize(), getUTF16Count(), getUTF32Count() Finally the concept of strlen should be replaced to an iterator based approach...
*** Issue 111982 has been marked as a duplicate of this issue. ***
*** Issue 112997 has been marked as a duplicate of this issue. ***
*** Issue 113000 has been marked as a duplicate of this issue. ***
Hi! The bug is soon going to be a TEN year old bug... I hope I would have enough time to fix it myself. Cheers DIEGO URRA BAUMGARTNER
Um, Happy Very Late Birthday, Bug 17171...
https://wiki.documentfoundation.org/ReleaseNotes/4.3#Raise_limit_of_characters_in_paragraphs Just FYI.
re emir sari: I found out about that today, kudos to LO for that one.
Reset assigne to the default "issues@openoffice.apache.org".