Bug 39234 - POIFS saved file cab cause VBA "File not found" error and corrupts XLS
Summary: POIFS saved file cab cause VBA "File not found" error and corrupts XLS
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 2.5-FINAL
Hardware: Other Windows XP
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 18155 19427 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-04-07 10:36 UTC by Bill Seddon
Modified: 2008-12-01 22:59 UTC (History)
2 users (show)



Attachments
the patch (8.21 KB, patch)
2006-12-12 04:46 UTC, Yegor Kozlov
Details | Diff
zip archive with the modified and added files (12.99 KB, application/x-zip-compressed)
2006-12-12 04:46 UTC, Yegor Kozlov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Seddon 2006-04-07 10:36:31 UTC
There have been several reports of this problem and although several indicate 
that the issue is resolved, it is not completely resolved.  These comments 
refer to 2.5.1.  However since the same code appears in 3.0 (devel) the same 
issue applies.  Earlier reports include: 36515, 17935, 15232.

The steps to reproduce the problem are simple:

1)	Create a new workbook
2)	Open the VBA editor and add a module called “modABC” 
        (note the case) and you don’t have to enter anything into the module
3)	Save the workbook and close Excel
4)	Read it in using POIFS and write it out again
5)	Open the new/updated workbook

When you open the workbook, the chances are that you will be greeted by a 
message saying “File not found”.  If this doesn’t occur on opening the 
workbook try editing the new module – or saving the workbook - then you will 
see the message.

The problem is that this module is 6 characters long and starts with a 
lowercase letter.  The error will also occur for modules with a wide range of 
names and arises because there is an error in the way POIFS saves the property 
table.

The rules for saving the table are documented on the POI web site and appear 
to be correct but there is an additional wrinkle and the implementation is not 
quite right.  The error is caused by the sorting of the property names in the 
comparator implementation in DirectoryProperty.java.

When a VBA macro is added to a document, Excel adds a new module with a 
default name starting with “Module1”.  Excel adds a new storage to the .xls 
document with a defined structure:

_VBA_PROJECT_CUR
----PROJECT
----PROJECTwm
----VBA
--------dir
--------Module1
--------Sheet1
--------ThisWorkbook
--------_VBA_PROJECT

The rules are that entries with shorter names should appear in a directory 
before entries with longer name and that entries with names that are the same 
length should be sorted alphabetically.  The implementation in 
DirectoryProperty.java get this right.  Unfortunately, the unstated rule is 
that Excel expects that the names will be sorted without case sensitivity.  As 
the comparator implementation in DirectoryProperty.java uses String.compareTo
() this only works providing the module, form and class names start with an 
upper case character.  Applying the example used in the steps to reproduce the 
error, the current implementation generated the following name order that 
Excel regards as invalid:

_VBA_PROJECT_CUR
----PROJECT
----PROJECTwm
----VBA
--------dir
--------Sheet1
--------ThisWorkbook
--------modABC
--------_VBA_PROJECT

Part of the fix is to change the comparator implementation to use 
String.compareToIgnoreCase().  This ensures that the names are in the right 
order.  However, it makes the _VBA_PROJECT entry appear first in the list and 
it MUST appear last.

The code below is the change I have made and that seems to work for me.  Maybe 
there is a more generic implementation of this fix that meets a wider set of 
rules.  If so, please let me know. 

Bill Seddon

        public int compare(Object o1, Object o2)
        {
	    String VBA_PROJECT = "_VBA_PROJECT";
            String name1  = (( Property ) o1).getName();
            String name2  = (( Property ) o2).getName();
            int    result = name1.length() - name2.length();

            if (result == 0)
            {
		if (name1.compareTo(VBA_PROJECT) == 0)
			result = 1; 
		else if (name2.compareTo(VBA_PROJECT) == 0)
			result = -1;
		else
			// result = name1.compareTo(name2);
			result = name1.compareToIgnoreCase(name2);
            }
            return result;
        }
Comment 1 Bill Seddon 2006-04-12 08:39:55 UTC
As a follow up, I discovered that the list of VBA streams generated by Excel 
can include ones with names like __SRP_0 or __SRP_2.  Like the _VBA_PROJECT 
stream, these cannot be sorted alphabetically.  Instead the "underscore" 
streams seem to have to be sorted separately and made to appear *after* all 
other streams giving this kind of structure:

_VBA_PROJECT_CUR
----PROJECT
----PROJECTwm
----VBA
--------dir
--------Module1
--------Sheet1
--------ThisWorkbook
--------__SRP_0
--------__SRP_1
--------_VBA_PROJECT

What would happen to a file with three leading underscores, I don't know.

Anyway, here is my updated code for the comparator method:

public int compare(Object o1, Object o2)
{
  String VBA_PROJECT = "_VBA_PROJECT";
  String name1  = (( Property ) o1).getName();
  String name2  = (( Property ) o2).getName();
  int  result = name1.length() - name2.length();

  if (result == 0)
  {
    // _VBA_PROJECT, it seems, will always come last
    if (name1.compareTo(VBA_PROJECT) == 0)
      result = 1; 
    else if (name2.compareTo(VBA_PROJECT) == 0)
      result = -1;
    else
    {
      if (name1.startsWith("__") && name2.startsWith("__"))
      {
        // Betweeen __SRP_0 and __SRP_1 just sort as normal
        result = name1.compareToIgnoreCase(name2);
      }
      else if (name1.startsWith("__"))
      {
        // If only name1 is __XXX then this will be placed after name2
        result = 1; 
      }
      else if (name2.startsWith("__"))
      {
        // If only name2 is __XXX then this will be placed after name1
        result = -1;
      }
      else
        // result = name1.compareTo(name2);
        // The default case is to sort names ignoring case
        result = name1.compareToIgnoreCase(name2);
    }
  }
  return result;
}


  
Comment 2 Yegor Kozlov 2006-12-12 04:40:54 UTC
Do we have any progress with this issue? 

Recently I confirmed that the problem exists. 
We are going to use POI in production to modify xls files full of VBA but it
turns out that 
POIFS changes the order of entries in ROOT._VBA_PROJECT_CUR.VBA node.
As the result the document gets corrupted.

The problem disappeared after I applied the suggested patch. I did some more
tests and
it worked fine in all cases. Let's commit it.

The patch includes a test xls file, unit tests and the source code.

P.S. Bill, thank you for the excellent explanation of the problem. It helped a lot.

Regards, Yegor Kozlov
Comment 3 Yegor Kozlov 2006-12-12 04:46:13 UTC
Created attachment 19244 [details]
the patch
Comment 4 Yegor Kozlov 2006-12-12 04:46:45 UTC
Created attachment 19245 [details]
zip archive with the modified and added files
Comment 5 Nick Burch 2006-12-12 10:22:58 UTC
Thanks for figuring this one out, and working out a fix and test.

I've applied the fix.
Comment 6 Yegor Kozlov 2007-01-12 03:30:48 UTC
*** Bug 18155 has been marked as a duplicate of this bug. ***
Comment 7 Josh Micich 2008-12-01 22:59:05 UTC
*** Bug 19427 has been marked as a duplicate of this bug. ***