Bug 36526

Summary: Out of memory error when signing or verifying big files
Product: Security - Now in JIRA Reporter: Agnes Juhasz <polysys>
Component: SignatureAssignee: XML Security Developers Mailing List <security-dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: Java 1.2.1   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Modified CreateSignature.java for testing OutOfMemoryError
Solution idea

Description Agnes Juhasz 2005-09-06 20:38:12 UTC
When signing or verifying a bigger (about 0.5 Gbyte) file got out of memory 
error in org.apache.xml.security.signature.XMLSignatureInput, at line 1275:
bytes=JavaUtils.getBytesFromStream(_inputOctetStreamProxy);
Comment 1 Raul Benito 2005-09-07 17:46:03 UTC
There is only one case where the library can avoid reading the whole reference
file in memory. And it is when the only transformation that is going to apply to
the transformation is B64 decoding/encoding.  And this optimiozation is already
implemented.

What is your case?

Can you post your signature, with the references and trasformation?

Thanks.

p.s- I understand that all issues are critical to the people suffering them. But
this one seems important to me but not critical. Mind this the next time you
report a bug to other projects.
Comment 2 Agnes Juhasz 2005-09-12 10:50:12 UTC
Created attachment 16366 [details]
Modified CreateSignature.java for testing OutOfMemoryError

new lines at line 151:
  
  // add a bigger file (size: 45 635 523)
  File bigfile = new File("jdk-1_5_0-doc.zip");
  sig.addDocument(bigfile.toURI().toString());
Comment 3 Raul Benito 2005-09-13 00:35:34 UTC
If you don't put any transformation to a reference, the inclusive C14N is
implicit, so the file need to be readed in memory. 
The only transformation that is not going to convert is base64 decoding.
So if you want to sign rawdata base64 encoding it before, and handle like this:

{
        //create the transforms object for the Document/Reference
        Transforms transforms = new Transforms(doc);

        transforms.addTransform(Transforms.TRANSFORM_BASE64_DECODE);
        // add a bigger file (size: 62 163 072 > 45 635 523)
        // comming from base64 /usr/portage/distfiles/xorg-x11-6.8.99.15.tar.bz2
         File bigfile = new File("/tmp/tmp.b64");
         sig.addDocument(bigfile.toURI().toString(),transforms,
Constants.ALGO_ID_DIGEST_SHA1);
      }

In this way as the test pass without any out of memory error (I have test with
just plain java).

I will mark this bug as invalid.
Comment 4 Agnes Juhasz 2005-09-13 14:21:33 UTC
(In reply to comment #3)
> If you don't put any transformation to a reference, the inclusive C14N is
> implicit, so the file need to be readed in memory. 
> The only transformation that is not going to convert is base64 decoding.
> So if you want to sign rawdata base64 encoding it before, and handle like 
this:
> {
>         //create the transforms object for the Document/Reference
>         Transforms transforms = new Transforms(doc);
>         transforms.addTransform(Transforms.TRANSFORM_BASE64_DECODE);
>         // add a bigger file (size: 62 163 072 > 45 635 523)
>         // comming from base64 /usr/portage/distfiles/xorg-x11-
6.8.99.15.tar.bz2
>          File bigfile = new File("/tmp/tmp.b64");
>          sig.addDocument(bigfile.toURI().toString(),transforms,
> Constants.ALGO_ID_DIGEST_SHA1);
>       }
> In this way as the test pass without any out of memory error (I have test 
with
> just plain java).
> I will mark this bug as invalid.

Thank you for your answer. But base64 encoding is not solution for my case. We 
have a Protection Profile and Security Target which dispose of "What You See 
Is What You Sign". We have to sign the original file selected and seen by the 
user not the base64 encoding of the file. The developed application will be CC 
evaluated and certified so I must satisfy the WYSWYS security objective.
I realized that here a resettable FileInputStream could be the solution. Could 
you give me instruction how and where to put it ?
Thanks,
Agnes
Comment 5 Agnes Juhasz 2005-09-19 17:41:15 UTC
Created attachment 16449 [details]
Solution idea

I have an idea to solve signing and verifying big files without Base64
transformation and without OutOfMemoryError.



1) In XMLSignatureInput.java should be a new member:

	File InputFile = null;

2) In XMLSignatureInput.java should be a new constructor:

	public XMLSignatureInput(File inputfile)
	{
		InputFile = inputfile;
	}

3) In XMLSignatureInput.java, updateOutputStream method should be modified:

	/**
	 * @param diOs
	 * @throws IOException
	 * @throws CanonicalizationException
	 */
	public void updateOutputStream(OutputStream diOs) throws
CanonicalizationException, IOException {	
	if (diOs==outputStream) {
		return;
	}
	if (bytes!=null) {
	    diOs.write(bytes);
	    return;	 
	 }else if (this.isElement()) {			  
	     Canonicalizer20010315OmitComments c14nizer =
		    new Canonicalizer20010315OmitComments();	   
	     c14nizer.setWriter(diOs);
	    c14nizer.engineCanonicalizeSubTree(this._subNode,this.excludeNode);

	    return;
	  } else if (this.isNodeSet()) {	
	     /* If we have a node set but an octet stream is needed, we MUST
c14nize
	      * without any comments.
	      *
	      * We don't use the factory because direct instantiation should be
a
	      * little bit faster...
	      */
	     Canonicalizer20010315OmitComments c14nizer =
		new Canonicalizer20010315OmitComments();	 
	     c14nizer.setWriter(diOs);
	     if (this._inputNodeSet.size() == 0) {
		// empty nodeset
		return;
	     }				    
	     c14nizer.engineCanonicalizeXPathNodeSet(this._inputNodeSet);      
	 
	     return;		 
	  } else {

/* --> begin of new lines */
	    if ( InputFile != null )
	    {
	      byte[] buffer = new byte[32*1024];
	      BufferedInputStream bis = null;
	      try
	      {
		bis = new BufferedInputStream(new FileInputStream(InputFile));
		int bytesread = 0;
		while ( (bytesread = bis.read(buffer)) != -1 )
		{
		  diOs.write(buffer,0,bytesread);
		}
	      }
	      finally
	      {
		if ( bis != null )
		{
		  bis.close();
		  bis = null;
		}
	      }
	      return;
	    }
/* <-- end of new lines */

	    InputStream is = getResetableInputStream();
	    if (bytes!=null) {
		//already read write it, can be rea.
		diOs.write(bytes,0,bytes.length);
		return;
	    }		 
	    is.reset(); 	   
	    int num;
	    byte[] bytesT = new byte[1024];
	    while ((num=is.read(bytesT))>0) {
		diOs.write(bytesT,0,num);
	    }
		
	  }
		
	}


4) In ResolverLocalFileSystem.java, line 65 and 66 should be modified:

/* original, line 65-66:
	FileInputStream inputStream = new FileInputStream(fileName);
	XMLSignatureInput result = new XMLSignatureInput(inputStream);
*/
/* new, line 65: */
	XMLSignatureInput result = new XMLSignatureInput(new File(fileName));




It is only an idea, not a real patch. I tried it with signing and verifying 
two big files (jdk-1_5_0-doc.zip, size=45635523 and lnx_920_disk2.cpio.gz,
size=588798999) and it works fine in this case:

<ds:SignedInfo Id="BigFile_2_Signature-1__SignedInfo-1">
  <ds:CanonicalizationMethod
Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315" /> 
  <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1" />

  <ds:Reference Id="BigFile_2_Signature-1__Reference-1"
Type="http://uri.etsi.org/01903/v1.2.2#SignedProperties"
URI="#BigFile_2_Signature-1__SignedProperties-1">
  <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" /> 
  <ds:DigestValue>NmZTo8j7D61PP1opL1HTs2YphN0=</ds:DigestValue> 
  </ds:Reference>
  <ds:Reference Id="BigFile_2_Signature-1__Reference-2"
URI="file:jdk-1_5_0-doc.zip">
  <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" /> 
  <ds:DigestValue>HZqRglwK5G0XFQ36wdGzx00w+kQ=</ds:DigestValue> 
  </ds:Reference>
  <ds:Reference Id="BigFile_2_Signature-1__Reference-3"
URI="file:lnx_920_disk2.cpio.gz">
  <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" /> 
  <ds:DigestValue>gaHz/04SteF1VCpuLY/+fkWksT8=</ds:DigestValue> 
  </ds:Reference>
</ds:SignedInfo>


Maybe something like this idea should be useful for other people or should be
applied in a new version of the library.
Comment 6 Jason Marshall 2006-07-12 17:29:05 UTC
I have the same problem as Agnes.  

We're using a Manifest in the signature to refer to very large external files. 
We're expecting to encounter multi-gigabyte files in some situations., and we'd
also prefer to have them signed as-is, not as a B64 encoding of the original. 
Unless I'm misunderstanding the workaround, this suggestion does NOT constitute
a useful resolution to the problem.

Please provide a direct programatic mechanism to add References to binary files
without reading them into memory in their entirety.
Comment 7 Jason Marshall 2006-07-12 17:44:47 UTC
To be more precise, I am experiencing this error in a call to
Manifest.generateDigestValues(), not in Signature (Agnes appears not to have
specified how the error occured)
Comment 8 Raul Benito 2006-07-12 18:17:17 UTC
The idea of Agnes sounds good.
But please post a patch, If you want to discuss further. If not sadly it will be
end in the TODO list. 

Sorry
Comment 9 Jason Marshall 2006-07-13 16:24:28 UTC
I did find myself a nice workaround for the problem.

At issue, as I currently understand the code, is that the code in
XMLSignatureInput assumes that it is dealing with a relatively small piece of
input (true for internal references, not so for external), and further that the
transforms will/may scan the input several times (also true for internal
references, probably not so for external).

As Agnes found, the crux of the problem starts with updateOutputStream, but
extends into the retrieval of the resettable stream (which has far more side
effects than I'm comfortable with).  

For my situation, I already had a custom resolver in place.  It was simple
enough, once I had a better grasp on the problem, to create a custom
XMLSignatureInput implementation that overrides updateOutputStream to open a
FileInputStream, copy it to diOS, and close the input stream.  

I suspect a better solution than either mine or Agnes' would present itself if I
could understand exactly what's going on with some of those member variables in
XMLSignatureInput.  The code is just peculiar enough that I'm not confident that
I understand everything it's trying to accomplish.

I can observe however that XMLSignatureInput has changed somewhat since Agnes'
proposal, so some rework would be necessary to apply it.
Comment 10 Raul Benito 2006-08-06 17:04:07 UTC
This bug is not critical. Change to enhancement.
But before we apply the patch, I think that we need to refactor
XmlSignatureInput with the strategy pattern.
Comment 11 sean.mullan 2009-09-24 11:12:25 UTC
I took another look at this bug and was able to fix it. The fix I putback was very similar to Agnes' proposed patch, with a few small differences.

This should fix the OOM problems if you are simply trying to sign the bytes of a large file without any transformations that cause intermediate node-sets to be created. It doesn't address those use cases but it does address the use case of this bug and the one that I believe most users have complained about.