Bug 53603 - "XML External Entities" vulnerability
Summary: "XML External Entities" vulnerability
Status: NEW
Alias: None
Product: Batik - Now in Jira
Classification: Unclassified
Component: Web Site (show other bugs)
Version: 1.8
Hardware: All All
: P2 minor
Target Milestone: ---
Assignee: Batik Developer's Mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-25 16:32 UTC by Nicolas GREGOIRE
Modified: 2012-07-30 09:39 UTC (History)
0 users



Attachments
Malicious SVG file (623 bytes, image/svg+xml)
2012-07-25 16:32 UTC, Nicolas GREGOIRE
Details
Result of the rasterization of xxe.svg (124.89 KB, image/png)
2012-07-25 16:33 UTC, Nicolas GREGOIRE
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nicolas GREGOIRE 2012-07-25 16:32:36 UTC
Created attachment 29114 [details]
Malicious SVG file

During visualization with Squiggle or rasterization via the CLI tool, XML external entities defined in the DTD are dereferenced and the content of the target file is included in the output.

The impact of this vulnerability range form denial of service to file disclosure. Under Windows, it can also be used to steal LM/NTLM hashes.

For some additional information about XXE attacks, please refer to http://cwe.mitre.org/data/definitions/827.html

How to reproduce: 
$> rasterizer xxe.svg -d xxe.png
Comment 1 Nicolas GREGOIRE 2012-07-25 16:33:09 UTC
Created attachment 29115 [details]
Result of the rasterization of xxe.svg
Comment 2 Thomas Deweese 2012-07-27 00:17:28 UTC
I don't want to dismiss this out of hand but I'm not sure I agree that a vulnerability really exists.

Given that Batik is more a toolkit than a finished product a lot more of the responsibility for avoiding these issues falls on the users rather than the library.  This more or less required given that it's impossible for us to know ahead of time what parts of the system the batik libraries should be allowed to access or not.

Please note that xxe.svg will fail if you use squiggle _and_ you fetch 'xxe.svg' from a server (I even tried variants like replacing etc/passwd with file:///etc/passwd).

People using the rasterizer to rasterize random content from the web should be more careful.  They can use Java's build in support for policy files to restrict access to the file system.  I don't think it would be appropriate for the toolkit to restrict this ahead of time since many legitimate uses may need fairly wide access to the filesystem.  I checked and browsers seem to block all access to the file system when loading a file from the disk even if it's co-located.  That may make sense for a browser but I think would block many legitimate uses of Batik.
Comment 3 Jeremias Maerki 2012-07-27 10:15:10 UTC
I agree with Thomas. In a short experiment, I was able to use XInclude (implemented by Apache Xerces-J) to force the same effect. Batik does not even know about XInclude since it's a parser-level feature.

However, it might be a good idea to write some documentation about it so users are reminded to secure their applications.
Comment 4 Helder Magalhães 2012-07-29 01:09:55 UTC
(In reply to comment #3)
> I agree with Thomas.

I agree with Thomas and Jeremias as well.


> However, it might be a good idea to write some documentation about it so
> users are reminded to secure their applications.

Decreasing severity and moving this to the "Web Site" component, more in the sense of "Documentation" (which doesn't exist); "javadoc" alone doesn't feel right as well: I'd say that these sort of reminders belong to a higher level than Javadoc, although probably something might be done in code documentation as well.


(In reply to comment #0)
> During visualization with Squiggle or rasterization via the CLI tool, XML
> external entities defined in the DTD are dereferenced and the content of the
> target file is included in the output.
> 
> The impact of this vulnerability range form denial of service to file
> disclosure. Under Windows, it can also be used to steal LM/NTLM hashes.

First of all, thanks for the report!

Thomas has provided a good insight about this potential issue in comment #2. Based in the feedback and in a few performed tests, I'd say the example provided is roughly equivalent to an ECMAScript getURL fetching the "/etc/passwd" (using the "file" protocol).

If you still believe this can be considered a security issue then please adjust the priority accordingly. In any case, elaborating a bit longer would help - for further understanding what can be involved or (simply) to serve as base for the documentation improvements.
Comment 5 Nicolas GREGOIRE 2012-07-30 09:39:12 UTC
I understand your position but I think that these risks should then be much more visible to casual users of the framework (i.e. documentation improvement).

Nowadays, it's trivial to find some applications using Batik in a insecure way (allowing the disclosure of local files). Examples:
- Apache FOP: vulnerable. Repro: FOP document including a malicious SVG image
- HighCharts JS: vulnerable. Repro: submit a malicious SVG to the on-line export feature of this graph library

MediaWiki seems impacted too:
http://www.mediawiki.org/wiki/Manual:$wgSVGConverters

Regarding XInclude: it is a feature of the XML parser and could be disabled there in security-conscious deployments
Regarding ECMAScript: it can disabled using command-line options. The main differences with the XXE attack are that this one is scriptless and can't be inhibited using options