Bug 58087 - Integration of Visio .vsdx parser into POI
Summary: Integration of Visio .vsdx parser into POI
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XDGF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 57484
Blocks:
  Show dependency tree
 
Reported: 2015-06-30 23:01 UTC by virtuald
Modified: 2017-10-08 07:24 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description virtuald 2015-06-30 23:01:14 UTC
We've just released poi-visio, an Apache 2.0 licensed .vsdx parser built on POI. It would be great to add this to the suite of parsers that POI currently has.

One blocking constraint is that the project depends on this patch that's been open for at least 6 months: https://bz.apache.org/bugzilla/show_bug.cgi?id=57484

Github link: https://github.com/BBN-D/poi-visio
Comment 1 Nick Burch 2015-06-30 23:13:52 UTC
It's certainly something we could look at doing, but you would need to push through some paperwork on your side. The process we'd need to follow is described at http://incubator.apache.org/ip-clearance/index.html - given the organisations mentioned in the readme of the project, do you think we have any hope of getting the OK + fairly simple grant form done before the heat-death of the universe? :)

In the mean time, could you do a patch for https://svn.apache.org/repos/asf/poi/site/src/documentation/content/xdocs/related-projects.xml to mention the project?
Comment 2 virtuald 2015-07-01 03:03:22 UTC
Interestingly enough, we started the process to open source some projects last September, and finally figured out most of the red tape issues -- so this project only took about two months to get out into the open. The government doesn't own any IP in the project, so we would only need to get our legal to sign off on any CLA et al... and I think we might be able to avoid our parent company. So, maybe a month or two if I keep bothering them.

I'll submit a patch for the external projects site. Also, I'd love for you to push my other patch (57484) that way I don't have to maintain a fork of POI. :)
Comment 3 virtuald 2015-07-16 17:17:22 UTC
Sorry for the delay, been off vacationing... 

My understanding of the documents you linked to seems to imply that there are two ways this could work out.

a) If my company signs the Corporate CLA, then the code can be imported under that agreement
b) The other way is they could sign the software grant form (http://www.apache.org/licenses/software-grant.txt), no Corporate CLA required

Does that sound about right?
Comment 4 virtuald 2015-08-11 18:57:01 UTC
Ping?
Comment 5 Nick Burch 2015-08-11 19:08:14 UTC
Sorry, yes. Your company can either do a standalone Software Grant, or they can file a CCLA and include a Software Grant in it. Best practice is normally to pop the source code tarball/zip somewhere, take a suitable hash of it, and include the hash in the grant. That way, people can review the code before, and it's quite clear what the grant covered.

You personally should try to get an ICLA on file, so we can get you on board to help maintain the code post-contribution.

Whether or not a CCLA is needed post-grant depends on the company in question. What Apache really needs after that is an ICLA, where you (as the ongoing contributor) basically confirm you've understood the Apache Software License v2, and are willing + able to contribute under it. Depending on your employment contract, you might not be able to sign that on your own, if so, a CCLA can be helpful to show that your employer is happy for you to be committing work that your employment contract says is "theirs" to the ASF.

Hope that helps? Sorry for the delay, I think everyone thought that someone else was going to reply...
Comment 6 virtuald 2015-08-13 22:47:40 UTC
A CCLA + software grant form has been filled out and emailed to the Apache Foundation secretary.
Comment 7 Nick Burch 2015-08-14 06:39:36 UTC
Great stuff. Next step is to fill out the IP clearance template, and give the IPMC a chance to review it. http://incubator.apache.org/ip-clearance/ip-clearance-template.html . I can hopefully do that on the weekend, if no-one beats me to it!
Comment 8 Nick Burch 2015-08-19 17:54:50 UTC
I've started on the clearance template, more still to do. It's at http://incubator.apache.org/ip-clearance/poi-visio.html and the source is https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/poi-visio.xml
Comment 9 virtuald 2015-08-19 18:09:39 UTC
Thanks Nick!

One thing I neglected to add to the software grant was https://github.com/BBN-D/ooxml-visio-schemas, which is required to run the existing code.

I think though, that if POI were to adopt this code, that it would make more sense to just integrate the correct XSD with the existing ooxml packages, instead of creating a new one just for visio.

The PDF download page for the visio specification (https://msdn.microsoft.com/en-us/library/hh645006%28v=office.12%29.aspx) says "Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you may make copies of it in order to develop implementations of the technologies described in the Open Specifications and may distribute portions of it in your implementations using these technologies or your documentation as necessary to properly document the implementation.".

Which is good. The XSD is in appendix A of the specification, and has a BSD style license and "No right to create modifications or derivatives of this Specification is granted herein."... but I assume the download page notice takes precedence.
Comment 10 Nick Burch 2015-08-26 11:33:18 UTC
Those looks like they're covered by the same terms as the other OOXML schemas we use, such as the Microsoft OSP, so I think that should be fine to use the same way
Comment 11 Nick Burch 2015-08-26 11:37:37 UTC
On a different topic, I wonder how easy/hard it might be to replace the curves external library dependency with the same approach taken in the new common SL code, eg org.apache.poi.sl.draw.geom.Path?
Comment 12 virtuald 2015-08-26 13:25:09 UTC
I tried using java.awt.geom for all my drawing needs, but the one item that I could not find (and did not want to write) was a NURBS spline implementation (and splines are used in visio all over the place, it turns out). I believe that the NURBS spline is the only thing that is used from curvesapi.

curvesapi is from an apparently unmaintained package on sourceforge.
Comment 13 virtuald 2015-09-19 16:46:45 UTC
I've put together a first pass at integration on this branch on github:

https://github.com/virtuald/poi/tree/xdgf

Using the XSD integration branch as a starting point (bug 58408) I first copied the original code over to src/ooxml (1st commit), and then ran the eclipse formatter on the code using POI's custom profile (2nd commit), and then added the curvesapi dependency. Compile + test seems to work.

Next, I'll do some cleanup, and I'll add a text extractor + unit test. I suspect at some point some basic code review from someone would be useful, and I can do those adjustments as necessary also.
Comment 14 virtuald 2015-09-19 19:58:27 UTC
Ok, added some cleanup plus some simple tests. Waiting on a light code review of some sort, then I'll "git svn dcommit" it to trunk.
Comment 15 virtuald 2015-09-29 01:51:13 UTC
Now that POI 3.13 has been released, anything left to do here other than push the code in? I'll be traveling starting tomorrow, so I won't be able to push my changes in until Saturday at the earliest.
Comment 16 Dominik Stadler 2015-09-29 12:44:44 UTC
Overall it looks good, although i did not look into the code in detail. Some more unit testing would be nice to get at least part of the code covered a bit more as otherwise overall coverage will drop a lot when this is pushed in.
Comment 17 virtuald 2015-10-07 03:14:18 UTC
I guess at this point my main question is whether the IP stuff has been completed? If it is, then I'd like to push my changes in that way others can play with it and we can see what's broken (or not).

I think I've figured out how to run the coverage tests... on the scale of POI, it's really not so bad. Looks like there's ~2100 lines of XDGF code, with 1400 missed lines. 30% is better than I would have guessed.

I mean, one low-effort solution that I could throw together to bump up the coverage is just add a document with a lot of crazy stuff in it, and add a test to parse that.
Comment 18 Nick Burch 2015-10-07 09:50:35 UTC
The IP Clearance has passed (well, a week or so ago, but I've only just remembered to update the site...)

You are now clear to go ahead and import the code from Github + update the license headers

Test coverage wise, we tend to test problematic/complex/confusing/key/heavily-reused parts of the code with "true" code-level unit tests. Most other parts get tested by doing create/save/load/read or load/modify/save/load/read or read/check type unit/basic-integration tests. 

Creating a handful of small test files touching key areas of functionality, then adding read+check and read+text-extract tests is probably the quickest way to increase coverage. Then look at using those for read+change+save+read+check tests, and if possible later create+save+"ensure roughly the same as this other one" tests!
Comment 19 Dominik Stadler 2017-10-08 07:24:00 UTC
Feature is imported for a long time now, unfortunately test-coverage never "materialized", however this bug should indicate that the feature itself is available for now, thus marking as FIXED.