Bug 35897

Summary: Password protected files
Product: POI Reporter: Alexander Kurtakov <akurtakov>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement CC: almas.kristoffer, trejkaz
Priority: P2 Keywords: PatchAvailable
Version: 3.8-dev   
Target Milestone: ---   
Hardware: Other   
OS: other   
Attachments: Password protected doc file
Password protected ppt file
Password protected xls file
A little hack which detects encrypted XLS files
A password-protected XLS file submitted by a user of the Aperture Framework
XOR encryption example (password "abc")
Proposed patch for XOR encryption detection
file will open in excel with password, just not in POI. encrypt type is unrecognized.
file reader - java class to test reading of the test1.xls file
This is a POI file in org.apache.poi.hssf.record that attempts to do encrytionType=1 if we get a 4.
[PATCH] Support for "Office Binary Document RC4 Encryption" decryption
[PATCH] Support for "Office Binary Document RC4 Encryption"
[PATCH] Support for "Office Binary Document RC4 Encryption" (read/write) and CryptoAPI for HSLF (read-only)

Description Alexander Kurtakov 2005-07-27 16:35:13 UTC
Shouldn't a more specific exception (e.g. PasswordProtectedFileException) be
thrown? And not only for doc files
but also for ppt and xls locally i have patched the code(for local use)
with my findings that record type 49 indicates also password protected
file for xls and 51432 is for ppt files. But this is only an assumtion
cause i was just checking what king of record fails when loading office
documents.
Comment 1 Alexander Kurtakov 2005-07-27 16:36:18 UTC
Created attachment 15796 [details]
Password protected doc file
Comment 2 Alexander Kurtakov 2005-07-27 16:37:09 UTC
Created attachment 15797 [details]
Password protected ppt file
Comment 3 Alexander Kurtakov 2005-07-27 16:37:36 UTC
Created attachment 15798 [details]
Password protected xls file
Comment 4 Trejkaz (pen name) 2006-01-19 00:29:54 UTC
I'm getting OutOfMemoryError when I read password-protected documents.  Is this
what you're talking about?

Would you mind attaching the code that detects when a file is password
protected?  I would have a use for something like this even if POI doesn't
commit it. :-)
Comment 5 Nick Burch 2006-07-04 10:41:50 UTC
We now throw an exception in the case of encrypted powerpoint documents.

OpenOffice seem to have some docs on protected excel files (eg
http://specs.openoffice.org/appwide/interoperability/Import_Password_Protected_MS_Office_Files.sxw)
if anyone wants to contribute some detection code.
Comment 6 Antoni Mylka 2011-06-01 16:12:33 UTC
Created attachment 27101 [details]
A little hack which detects encrypted XLS files

This is based on an observation taken from the Excel Spec (page 119):

If you type a protection password (File menu, Save As command, Options dialog box), the FILEPASS record appears in the BIFF file. The wProtPass field contains the encrypted password. All records after FILEPASS are encrypted

The patch looks for the FilePass record. If it's found, all subsequent RecordFormatExceptions are interpreted as encryption failures.

This doesn't cover the per-sheet passwords, only the per-file passwords. It seems to work for me. Is there a better way?
Comment 7 Antoni Mylka 2011-06-01 16:15:13 UTC
Created attachment 27102 [details]
A password-protected XLS file submitted by a user of the Aperture Framework

Attached the password-hello.xls file. It's a password-protected XLS file referred in my unit test in the previous patch. It has been submitted by "cbamfort", a user of the Aperture Framework to the Aperture issue:

http://sourceforge.net/tracker/?func=detail&aid=3088113&group_id=150969&atid=779503
Comment 8 Trejkaz (pen name) 2014-01-10 04:41:35 UTC
Bumping this to ask: can HSSF at least throw EncryptedDocumentException for encrypted documents?

Currently, this is what I'm seeing:

org.apache.poi.hssf.record.RecordFormatException: HSSF does not currently support XOR obfuscation
    at org.apache.poi.hssf.record.FilePassRecord.<init>(FilePassRecord.java:52)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
    at org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:57)

This is thrown from the FilePassRecord constructor.

The hack Antoni has described above seems to rely on FilePassRecord being successfully constructed, so I guess that won't help.

At the moment, our code is catching EncryptedDocumentException, but then these cases fall through the cracks, because the wrong exception has been thrown.
Comment 9 Trejkaz (pen name) 2014-01-10 06:04:40 UTC
Created attachment 31194 [details]
XOR encryption example (password "abc")

Attaching a sample I constructed.
Comment 10 Trejkaz (pen name) 2014-01-10 06:06:24 UTC
Created attachment 31195 [details]
Proposed patch for XOR encryption detection

Attaching the patch we're using here to work around it. The only thing I'm not sure about is the means we're using to propagate the exception. Perhaps the code there should just propagate all RuntimeExceptions.
Comment 11 Trejkaz (pen name) 2014-01-10 06:10:18 UTC
Someone else trying to make test data for this issue also hit the error:
"Unknown encryption info: 4"

I wasn't sure if that was a problem or a legitimate value, so I left it alone.
Comment 12 Andreas Beeker 2014-01-10 23:37:49 UTC
applied with r1557281
+ Junit4 modifications to support expected exceptions

I leave it open for now, as I would prefer to have that XOR encryption and maybe CryptoAPI also to be implemented ...
Comment 13 ns 2014-03-14 21:32:39 UTC
Hi.  I am getting org.apache.poi.hssf.record.RecordFormatException: Unknown encryption info 4
when opening up this the attached file test1.xls

The thing is, if I use Excel (the UI itself) I and supply the password, it will open.  Or, if I resave it using a DIFFERENT password in Excel, it will open.

I am thinking this file (sent to me) was created by a non-Excel program, probably some other library.  But if Excel can open it, why can POI?  I am guessing it is an encryption type (what is type 4?)
Comment 14 ns 2014-03-14 21:37:12 UTC
Created attachment 31389 [details]
file will open in excel with password, just not in POI. encrypt type is unrecognized.

the password is "freedom"
Comment 15 Nick Burch 2014-03-15 08:35:09 UTC
You can look up the details of the FilePass record in http://msdn.microsoft.com/en-us/library/dd952596%28v=office.12%29.aspx at section 2.4.117 - FilePass

That says valid values for the encryption info are 0-3 only, and not 4!

Any chance you could find out where the file came from, and what encryption they think they'd set on it?
Comment 16 Trejkaz (pen name) 2014-03-17 11:43:25 UTC
This could be a coincidence, but...

http://msdn.microsoft.com/en-us/library/dd952596%28v=office.12%29.aspx
Says the 0x0002 and 0x0003 values refer to
RC4 CryptoAPI encryption header structure[MS-OFFCRYPTO], 2.3.5.1

But if you look that one up:
http://msdn.microsoft.com/en-us/library/dd922755(v=office.12).aspx
EncryptionVersionInfo (4 bytes): A Version structure (section 2.1.4) that specifies the encryption version used to create the document and the encryption version required to open the document. Version.vMajor MUST be 0x0002, 0x0003, or 0x0004<22>

The note says:

<22> Section 2.3.5.1: Applications in the 2007 Office system earlier than Service Pack 2 set a Version.vMajor value of 0x0003. Versions with Service Pack 2 and Office 2010 set a Version.vMajor value of 0x004. Office 2003 applications set a Version.vMajor version of 0x0002.

So I guess it's just a newer format?
Comment 17 Nick Burch 2014-03-17 11:49:03 UTC
If we're able to create a file using Excel (any version) that generates the value of 4, then we can ask the Microsoft docs team to investigate and update the docs.

The OP has indicated that they think the latest problematic file (0x4) was generated by something other than Excel, in which case it isn't Microsoft's problem!

So, if someone is able to recreate a file with 0x4 using Excel please upload it + let us know!

(For now, someone could try changing the if block to accept 2, 3 or 4, and see what happens...)
Comment 18 ns 2014-03-17 13:58:29 UTC
(In reply to Trejkaz (pen name) from comment #16)
> This could be a coincidence, but...
> 
> http://msdn.microsoft.com/en-us/library/dd952596%28v=office.12%29.aspx
> Says the 0x0002 and 0x0003 values refer to
> RC4 CryptoAPI encryption header structure[MS-OFFCRYPTO], 2.3.5.1
> 
> But if you look that one up:
> http://msdn.microsoft.com/en-us/library/dd922755(v=office.12).aspx
> EncryptionVersionInfo (4 bytes): A Version structure (section 2.1.4) that
> specifies the encryption version used to create the document and the
> encryption version required to open the document. Version.vMajor MUST be
> 0x0002, 0x0003, or 0x0004<22>
> 
> The note says:
> 
> <22> Section 2.3.5.1: Applications in the 2007 Office system earlier than
> Service Pack 2 set a Version.vMajor value of 0x0003. Versions with Service
> Pack 2 and Office 2010 set a Version.vMajor value of 0x004. Office 2003
> applications set a Version.vMajor version of 0x0002.
> 
> So I guess it's just a newer format?

I am wondering as Trejkaz states, that this is just RC4 encryption, but it's a different version number, which gives us 4 instead of 0-3.  I will download POI source and see if I can just change that line of code - that type=4 just does RC4 decryption and see if that works.  Worth at shot.  If it does, I'll post it up here.
Comment 19 ns 2014-03-17 14:04:39 UTC
also, the funny thing about this file is, if I use Aspose.Cells for Java, it opens just fine. So apparently the dudes at Aspose know what to do with this encryption type=4, since THEIR library works.  (pooey!) Aspose costs $1000 and there is no way I can afford that. 

It also opens in Excel just fine (the UI).  POI just doesn't open it. 

POI serves all my needs, once the file is opened (like I resaved it unencrypted in Excel) I can read the entire file with POI perfectly.  It's just opening it when it's encrypted is the problem.
Comment 20 ns 2014-03-17 18:05:11 UTC
I just found out, they are using VBA's Workbook.SaveAs(...) where that method call takes in a password.  They are not setting any special encryption type.  So it's whatever VBA defaults it's encryption type to.
Comment 21 Nick Burch 2014-03-17 18:54:18 UTC
Any chance you could get a short self-contained block of VBA which when run creates a file with the problem? I can then forward that to the Microsoft docs team, and ask them to update [MS-XLS] to cover this case
Comment 22 ns 2014-03-17 21:43:17 UTC
(In reply to Nick Burch from comment #21)
> Any chance you could get a short self-contained block of VBA which when run
> creates a file with the problem? I can then forward that to the Microsoft
> docs team, and ask them to update [MS-XLS] to cover this case

so sorry, I cannot.  I was just told:

"They encrypt on our end using standard Microsoft Excel 2013 password protection (on the workbook level, saved as an excel 2003 file).  And if it helps, all workbooks are created with VBA using the Workbook.SaveAs method.  The default password is added at that point.  Nothing else such as coding is being done."

It is an external company sending me this file and I don't even have permission to TALK to their developers - just a mid level manager dude.  They will not send me code I am sure of it.
Comment 23 ns 2014-03-17 22:08:56 UTC
Ok, I downloaded the POI source 3.10-FINAL and built it.  Made the change in FilePassRecord.java to do the _encryptionType=3 stuff if the type is actually 4.  Then I get  _minorVersionNo=2 instead of 1, which 1 is the only thing it supports.  So I change the code to do the same as _minorVersionNo=1 if I get a 2... and it doesn't decrypt.  I get a "Supplied password is invalid for docId/saltData/saltHash"

This is probably because the _saltData and _saltHash is somewhere else for this file type?  I dunno.  I will attached the new FilePassRecord.java and the code that reads the excel file (ExcelFileReader.java) which I wrote.
Comment 24 ns 2014-03-17 22:12:16 UTC
Created attachment 31398 [details]
file reader - java class to test reading of the test1.xls file

this is just the ExcelFileReader - a class the tries to read the previously attached test1.xls file.  Tries to read it using POI-3.10-FINAL version
Comment 25 ns 2014-03-17 22:14:16 UTC
Created attachment 31399 [details]
This is a POI file in org.apache.poi.hssf.record that attempts to do encrytionType=1 if we get a 4.

This is the modified file in the 3.10-FINAL source that:

1. does a RC4 (3) if we get a 4.
2. does a _minorVersionNo=1 when we get a _minorVersionNo=2 (treats them as the same)
Comment 26 Nick Burch 2014-03-18 15:06:17 UTC
I've dropped the Microsoft docs team an email to ask about it, I'll report back when I hear from them. Since this does look to be something that new copies of Excel can generate, with any luck they'll be able to get the documentation updated to cover this new case, then we can update POI based on that to support it!
Comment 27 ns 2014-03-18 20:38:45 UTC
Hey, I just found out that yes, they are using VBA to do this.  and the actual call is:

Workbook.SaveAs FileName:=strDir & strFileName, FileFormat:=xlExcel8, password:="freedom"

(password changed for opsec of course)
the strDir and strFileName are just the directory and filename that they write out my file.

At least now we know what the FileFormat is.  Hope that helps!

You dudes are awesome by the way - I never dreamed yall would be this responsive and quick to action!
Comment 28 Nick Burch 2014-04-17 14:02:07 UTC
I've heard back from the Microsoft docs team:

> Thank you for bringing this to our attention.  I have researched this with
our Documentation product group, and they are planning to address this as a
change to the documentation in a future release of the document.

No ETA yet on that fix, but hopefully it'll get clarified reasonably soon, and we can then make the appropriate changes
Comment 29 Nick Burch 2014-05-26 09:04:11 UTC
*** Bug 56564 has been marked as a duplicate of this bug. ***
Comment 30 Koxta 2014-11-01 03:07:57 UTC
If this helps, I managed to create an excel 97-03 file (.xls) with encryptionType = 4 by creating a new spreadsheet in Excel 2013, populating with some random data, password protecting it, and saving as "Excel 97-03 (.xls)". I can attach it if needed.

Also, I see that the documentation has been updated under this link:

http://msdn.microsoft.com/en-us/library/dd952596%28v=office.12%29.aspx

which reads that 0x2, 0x3 and 0x4 all indicate CryptoAPI headers.
Comment 31 Nick Burch 2014-11-04 23:24:07 UTC
As it's now in the Microsoft docs, in r1636776 I have updated FilePassRecord to tread 4 the same as 2 & 3, along with adding a unit test for it.

Since we can't currently handle the RC4 CryptoAPI encryption header structure, the test file with type 4 will trigger an EncryptedDocumentException, but at least it's cleaner
Comment 32 bearbalu 2014-11-10 03:18:31 UTC
Based on FilePassRecord code,it appears that poi does NOT handle 2,3 or 4 (all of them are RC4 CryptoAPI encryption header structure). Any workarounds.

As an interesting aside, when I google around, I don't see anyone having reported the EncryptedDocumentException: HSSF does not currently support CryptoAPI encryption. Does that mean there are no documents out in real world there with 2 or 3?
Comment 33 bearbalu 2014-11-10 06:22:13 UTC
BTW, the Excel files i have are not password protected. There is worksheet level protection of certain cells. When I open the file in my Excel 2010 and resave it, the issue goes away. Presumably, it is some other configuration of Excel that creates these files?
Comment 34 Andreas Beeker 2014-11-16 22:20:17 UTC
Created attachment 32208 [details]
[PATCH] Support for "Office Binary Document RC4 Encryption" decryption

This is a patch for bearbalus files.
Currently I don't have shareable input files.
So I'll apply this, when I have the encryption code ready.
Comment 35 Andreas Beeker 2014-11-22 02:00:53 UTC
Created attachment 32220 [details]
[PATCH] Support for "Office Binary Document RC4 Encryption"

This is a preview of the commit coming soon.
The generated files open successfully in Excel Viewer and Word Viewer.
I'll probably add another few lines of javadocs in the final ...

Andi.
Comment 36 Andreas Beeker 2014-12-04 00:57:12 UTC
Created attachment 32258 [details]
[PATCH] Support for "Office Binary Document RC4 Encryption" (read/write) and CryptoAPI for HSLF (read-only)

This is a combined patch for:
- Office Binary Document RC4 Encryption (read/write)
- HSLF CryptoAPI (read-only)

I'll apply it after 3.11-final is out - maybe with write support for hslf cryptoapi
Because of bugzilla upload restriction, the test file need to be downloaded from http://blogs.msdn.com/b/openspecification/archive/2009/05/08/dominic-salemno.aspx
Comment 37 Dominik Stadler 2016-02-15 21:08:07 UTC
As far as I see the latest patch from Andi is applied, so I am closing this as FIXED, please open new bugs for any remaining problems as this one has quite a long history and many changes were done to the code in the meantime.