Bug 52690

Summary: Length of decrypted data stream is discarded - can't create valid Office Documents
Product: POI Reporter: phil
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: major    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.4   

Description phil 2012-02-16 23:01:37 UTC

    
Comment 1 phil 2012-02-16 23:15:36 UTC
It is extremely important that the size of the decrypted data stream exactly matches that specified in the input stream. Failure to ensure this means that windows System.IO.Packaging.Package.Open() method will return an error when trying to open the decrypted document. In practice this means that MS Office will report that the file is corrupt

Just reading to the end of the input stream is not sufficient because there are normally padding bytes that must be discarded

Currently Decryptor and/or its subclasses read and *discard* the required length

In order to be able to create valid documents from encrypted ones, this length must be made available so the output stream can be truncated

I wlll submit my proposed patch when I have sown time to create it in the required format
Comment 2 Nick Burch 2012-02-20 11:09:03 UTC
I'm not quite clear on what you're trying to do, where the problem comes in, and why you're talking about .net APIs?

Could you maybe provide some more detail (or even better a unit test) that explains what goes wrong, where and why?
Comment 3 phil 2012-02-20 21:52:28 UTC
Sorry, Buzilla and/or my browser lost my original description of the problem

The issue is that the classic POI 'myDecrypt' method below in general produces output files that are considered corrupt by Micrososft Office

If say you use MS Word to open a .docx file decrypted by this method you will get an error dialog saying the file is corrupt and asking if you would like attempt recovery (which incidentally will succeed)

The problem is that the output length is too long. Output must be truncated to the length specified in the input data stream and currently discarded for example in "EcmaDecryptor>>getDataStream(DirectoryNode dir)"

 This is the offending line

 128:       long size = dis.readLong();

The solution is to save this length in an instance variable of the class Decryptor so that it is accessible to user written code (Decryptor>>getLength()). Then myDecrypt method can then be modified to work correctly.
=====
	private void myDecrypt(String filename, String password) throws FileNotFoundException, IOException {
		File inFile = new File(filename);
		File outFile = new File(new File(filename).getParentFile(), "Decrypted" + new File(filename).getName());

		System.err.println("Attempting to decrypt " + inFile.getAbsolutePath() + " to " + outFile.getAbsolutePath());

		POIFSFileSystem filesystem = new POIFSFileSystem(new FileInputStream(inFile));
		EncryptionInfo info = new EncryptionInfo(filesystem);
		Decryptor d = Decryptor.getInstance(info);

		try {
			if (!d.verifyPassword(password)) {
				throw new RuntimeException("Unable to process: wrong password");
			}

			InputStream dataStream = d.getDataStream(filesystem);

			OutputStream out = new FileOutputStream(outFile);
			byte buf[] = new byte[1024];
			int len;
			while ((len = dataStream.read(buf)) > 0)
				out.write(buf, 0, len);
			out.close();
			dataStream.close();

		} catch (GeneralSecurityException ex) {
			throw new RuntimeException("Unable to process encrypted document", ex);
		}
		System.err.println("Finished " + inFile.getAbsolutePath());
	}
Comment 4 Yegor Kozlov 2012-02-23 12:06:04 UTC
Please attach a decrypted file so that we can test your code sample.

Yegor  

(In reply to comment #3)
> Sorry, Buzilla and/or my browser lost my original description of the problem
> 
> The issue is that the classic POI 'myDecrypt' method below in general produces
> output files that are considered corrupt by Micrososft Office
> 
> If say you use MS Word to open a .docx file decrypted by this method you will
> get an error dialog saying the file is corrupt and asking if you would like
> attempt recovery (which incidentally will succeed)
> 
> The problem is that the output length is too long. Output must be truncated to
> the length specified in the input data stream and currently discarded for
> example in "EcmaDecryptor>>getDataStream(DirectoryNode dir)"
> 
>  This is the offending line
> 
>  128:       long size = dis.readLong();
> 
> The solution is to save this length in an instance variable of the class
> Decryptor so that it is accessible to user written code
> (Decryptor>>getLength()). Then myDecrypt method can then be modified to work
> correctly.
> =====
>     private void myDecrypt(String filename, String password) throws
> FileNotFoundException, IOException {
>         File inFile = new File(filename);
>         File outFile = new File(new File(filename).getParentFile(), "Decrypted"
> + new File(filename).getName());
> 
>         System.err.println("Attempting to decrypt " + inFile.getAbsolutePath()
> + " to " + outFile.getAbsolutePath());
> 
>         POIFSFileSystem filesystem = new POIFSFileSystem(new
> FileInputStream(inFile));
>         EncryptionInfo info = new EncryptionInfo(filesystem);
>         Decryptor d = Decryptor.getInstance(info);
> 
>         try {
>             if (!d.verifyPassword(password)) {
>                 throw new RuntimeException("Unable to process: wrong
> password");
>             }
> 
>             InputStream dataStream = d.getDataStream(filesystem);
> 
>             OutputStream out = new FileOutputStream(outFile);
>             byte buf[] = new byte[1024];
>             int len;
>             while ((len = dataStream.read(buf)) > 0)
>                 out.write(buf, 0, len);
>             out.close();
>             dataStream.close();
> 
>         } catch (GeneralSecurityException ex) {
>             throw new RuntimeException("Unable to process encrypted document",
> ex);
>         }
>         System.err.println("Finished " + inFile.getAbsolutePath());
>     }
Comment 5 Yegor Kozlov 2012-02-26 09:03:41 UTC
As of r1293784, POI provides Decryptor#getLength() that returns length of the decrypted data stream. 

The getLength() method must be called after Decryptor.getDataStream() where the length variable is initialized. An attempt to call getLength() prior to getDataStream() will result in IllegalStateException. 

Regards,
Yegor