Bug 53475

Summary: Support for more DOCX encryption versions
Product: POI Reporter: Jan Høydahl <jan.asf>
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: major    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: All   
Attachments: Encrypted word doc which crashes POI
patch for ignore missing cspname element
encrypted doc - AES-128 with 256 bit key
Patch for decrypting AES-192/256
JCE-check added to tests and AgileDecryptor
patch for encryption support - Part 1 - refactor crypt code

Description Jan Høydahl 2012-06-27 11:35:15 UTC
Created attachment 29002 [details]
Encrypted word doc which crashes POI

PROBLEM
=======

When parsing password protected OOXML Word files, the EncryptionInfo class has explicit support for (versionMajor == 4 && versionMinor == 4 && encryptionFlags == 0x40), while all other versions are treated the same. For some enctypted DOCX documents this causes an exception:

java.lang.RuntimeException: Salt size != 16 !?
	at org.apache.poi.poifs.crypt.EncryptionVerifier.<init>(EncryptionVerifier.java:121)
	at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:66)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:211)

HOW TO REPRODUCE
================
Download Apache Tika 1.1 (http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.1.jar) and start it using 
  java -jar tika-app-1.1.jar password-is-solrcell.docx
which triggers the exception. NOTE: Tika does not yet have an option to pass in a password but it crashes before we get to dectyption.

SOLUTION
========
We need to dig into the various versions that a doc can have and what encryption schemes to support. Here is a link to a page explaining the file formats and also providing a .NET program for dectyption (have not had the chance to test it on my example docx file though): http://www.lyquidity.com/devblog/?p=35
Comment 1 Andreas Beeker 2013-11-03 20:43:35 UTC
Created attachment 31004 [details]
patch for ignore missing cspname element

The patch workarounds a missing data element - namely the cspname element in the EncryptionHeader. Although the MS-OFFCRYPTO doesn't mention anything of it being optional, Libre Office and Word Viewer can open the file.

Further encrypted test files with different encryption settings would be nice.
Comment 2 Andreas Beeker 2013-11-06 23:45:45 UTC
Created attachment 31021 [details]
encrypted doc - AES-128 with 256 bit key

This attachment is a Word 2010 encrypted .docx - with customized encryption settings. I have only changed the keysize from 128 to 256 bits, but it might be neccessary to change other encryption settings too. Currently POI can't open this file.

Libre Office 4.0 can't display the file too, but the ms word viewer does.

To test the file simply use the test class of the cspname patch - password is "pass"

To verify the (registry) settings, have a look in the EncryptionVerifier - the xml discriptor says 256 bits 

Just for further reference:
- To change the word encryption settings, you'll need to use the "office administration templates" (just google for your version)
- use the policy editor "gpedit.msc" and add the word adm file under user policies
- change the registry settings over the templates
- see also http://www.dslreports.com/forum/r20210979-Office-2007-Enabling-256bit-AES-encryption
Comment 3 Andreas Beeker 2013-11-06 23:58:21 UTC
I've forgotten to mention, that Word 2010 has two options: a password for read and for edit/write protection. Although both password are the same for the test file, this might result in an additional decryption step ...
Comment 4 Nick Burch 2013-11-07 21:30:28 UTC
Thanks for this, applied (with minor test and comments tweaks) in r1539828.
Comment 5 Andreas Beeker 2013-11-10 11:40:03 UTC
Created attachment 31029 [details]
Patch for decrypting AES-192/256

AES has always a block-size of 128 bits, therefore we need take the keysize of 128, 192 or 256 bits into account.
Part of the patch fixes a wrong usage of the bit-sizes, i.e. the IV has to be calculated by the block size (128 bits) whereas the encrypted key needs to use the key size (e.g. 256 bits).
See also MS-OFFCRYPTO - 2.3.4.11ff
I'll try to provide a few more test files with other different encryption settings and haven't tested document encryption at all ...

[1] http://msdn.microsoft.com/en-us/library/dd924776(v=office.12).aspx
Comment 6 Nick Burch 2013-11-12 11:38:29 UTC
Thanks for this, applied (with the odd minor tweak) in r1541009.

One thing I did notice is that the code is a little short on JavaDocs in places, and can be a bit short on comments too. If you have a few minutes, while you can still remember the code flow and meaning, it'd be great if you could do a patch to solve that too!
Comment 7 Andreas Beeker 2013-11-21 00:00:13 UTC
Created attachment 31061 [details]
JCE-check added to tests and AgileDecryptor

This patch contains the Assume-Check if the JCE restrictions are in place, which Dominik recommended.

With the Junit3 code, the Assume didn't work, so I needed to convert it to Junit4 annotated code.

Furthermore I've removed some obsolete lines, as the Biff8EncryptionKey is not needed with Agile-Decryption.

I'll add more javadocs when the encryption stuff is finished and also a comment about the JCE policies in the website docs.
Comment 8 Nick Burch 2013-11-21 11:18:30 UTC
Thanks, latest patch (applied with a few minor tweaks to comments/error messages) in r1544121.
Comment 9 Andreas Beeker 2013-11-24 11:34:27 UTC
Created attachment 31073 [details]
patch for encryption support - Part 1 - refactor crypt code

This is part 1 (of presumably 4-5 parts).

As this is a bigger change, I'll post changes as soon as a certain feature compiles/tests stable.

I plan the following parts:
- Part 1: refactor decryption code, so I can use it for encryption
- Part 2: xmlbeans support for encryption descriptor (see details at Part 2)
- Part 3: encryption classes
- Part 4: move en-/decryption code out of main-poi???

As part 2 will break the release, i.e. you'll need xmlbeans for the main-poi, you might want to wait until all parts are out and of course I wouldn't mind a discussion "xmlbeans vs. static xml strings in code"

Currently the patches will be based on the trunk, so part X contains changes of part X-1,... I'll update the diffs, if predecessor parts have been applied
Comment 10 Nick Burch 2013-11-24 18:26:46 UTC
Anything that's to do with xmlbeans needs to live in the poi-ooxml jar, not the main poi one. Possibly that means we need to move some of the encrypt/decrypt code into the poi-ooxml jar

Also, it might make sense to open a new bug for this, rather than using this one, so it's easier to track the new features
Comment 11 Andreas Beeker 2013-11-26 23:46:49 UTC
(In reply to comment #10)
> Also, it might make sense to open a new bug for this, rather than using this
> one, so it's easier to track the new features

I've created the bug entry #55818 and will log my progress there ...