Bug 51196 - [PATCH] Patch to simplify XSSFChart creation
Summary: [PATCH] Patch to simplify XSSFChart creation
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.8-dev
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-13 11:54 UTC by Roman Kashitsyn
Modified: 2011-09-10 19:04 UTC (History)
0 users



Attachments
A patch based on 735dd86ac48481cfe6350d92997514e2aa7dee21 git revision (21.75 KB, patch)
2011-05-13 11:54 UTC, Roman Kashitsyn
Details | Diff
Generated chart example (5.50 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2011-05-13 11:56 UTC, Roman Kashitsyn
Details
A patch based on 735dd86ac48481cfe6350d92997514e2aa7dee21 git revision (26.40 KB, patch)
2011-05-16 07:55 UTC, Roman Kashitsyn
Details | Diff
Cumulative patch for scatter chart creation support draft. (49.92 KB, patch)
2011-05-18 09:27 UTC, Roman Kashitsyn
Details | Diff
Cumulative patch with applied remarks. (88.88 KB, patch)
2011-05-19 11:16 UTC, Roman Kashitsyn
Details | Diff
Patch for ScatterChart example and formatAsString() method (2.86 KB, patch)
2011-05-20 13:34 UTC, Roman Kashitsyn
Details | Diff
Some new features for charts. (37.25 KB, patch)
2011-05-27 20:57 UTC, Roman Kashitsyn
Details | Diff
Draft of numeric cache implementation. (37.40 KB, patch)
2011-07-17 20:41 UTC, Roman Kashitsyn
Details | Diff
patch with ChartDataSource strategy implementation (66.96 KB, patch)
2011-08-29 06:23 UTC, Roman Kashitsyn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kashitsyn 2011-05-13 11:54:30 UTC
Created attachment 26995 [details]
A patch based on 735dd86ac48481cfe6350d92997514e2aa7dee21 git revision

I need to insert charts in xslx documents, and at the moment (as I can see) POI does not have support for that. So I write a simple patch to simplify chart creation process. I have tried to adopt my changes to existing code (I use git repo on GitHub). At the moment the data model is the following:

XSSFDrawing -- creates -- XSSFChart -- has -- XSSFChartLabel
                            |
                        belongs to
                            |
                       XSSFGraphicFrame

I have also add a simple test for new functionality. But the more interesting thing is that those changes do the thing: generated xlsx file with scetter chart is attached (low-level client code is still required for that, the next step is to provide appropriate API).

I am not a POI hacker and this is my first patch so any feedback will be helpful.

POI is a great thing, lets improve it )
Comment 1 Roman Kashitsyn 2011-05-13 11:56:44 UTC
Created attachment 26996 [details]
Generated chart example
Comment 2 Roman Kashitsyn 2011-05-16 07:55:02 UTC
Created attachment 27004 [details]
A patch based on 735dd86ac48481cfe6350d92997514e2aa7dee21 git revision
Comment 3 Roman Kashitsyn 2011-05-18 09:27:32 UTC
Created attachment 27021 [details]
Cumulative patch for scatter chart creation support draft.

This is only a draft of possible chart implementation. It is only support new chart creation and only one type of chart. Parsing of existing charts not supported yet. It will be implemented soon. Comments and ideas are welcome.
Comment 4 Roman Kashitsyn 2011-05-18 09:54:25 UTC
I have also added a draft of example `src/examples/src/org/apache/poi/xssf/usermodel/examples/ScatterChart.java' to demonstrate the api in action (I have not tested it yet, it just compiles, but I have very similar code in my application that works fine). Charts generated with the api open fine in MS Office 2007 and OpenOffice 3.3 (of course, they look much more prettier in ms office).

Unfortunately, I have a problem with `poi-ooxml-schemas-3.8-beta3-TIMESTAMP.jar' file build. When I am trying to run my application with it, I got `java.lang.ClassNotFoundException'. So I am forced to use `ooxml-schemas-1.1.jar' to run my application.
Comment 5 Yegor Kozlov 2011-05-18 12:01:53 UTC
Roman,

Good work! I'm reviewing your patch. All looks good and I'm going to apply it whithin a day or two.

A couple of comments: 

 (1)  In ChartAxis you are using the Double object instead of the primitive double.  Do you mind if we switch it to use the primitive type?

 (2) We compile POI with the Java language level 5.0 which means that @Override in interfaces isn't supported. Please remove.

Other than that, very cool! There is only a few tests for now, but I'm sure there will be more :)

Regards,
Yegor

(In reply to comment #4)
> I have also added a draft of example
> `src/examples/src/org/apache/poi/xssf/usermodel/examples/ScatterChart.java' to
> demonstrate the api in action (I have not tested it yet, it just compiles, but
> I have very similar code in my application that works fine). Charts generated
> with the api open fine in MS Office 2007 and OpenOffice 3.3 (of course, they
> look much more prettier in ms office).
> 
> Unfortunately, I have a problem with
> `poi-ooxml-schemas-3.8-beta3-TIMESTAMP.jar' file build. When I am trying to run
> my application with it, I got `java.lang.ClassNotFoundException'. So I am
> forced to use `ooxml-schemas-1.1.jar' to run my application.
Comment 6 Roman Kashitsyn 2011-05-18 12:43:30 UTC
Hi Yegor,

(1) I thought about using primitive types, but there is a problem: these parameters are not required and they don't have default values. So we need to inform user if requested parameter has not been set. To return null is the most simple way to do it. Another way is to use primitives and provide additional isSetXXX() methods and return NaN if requested parameter is not set. How do you think, should I introduce the second approach?

(2) Ok, no problems.

I will try to provide cumulative patch within two days.

Best regards,
Roman


(In reply to comment #5)
> Roman,
> 
> Good work! I'm reviewing your patch. All looks good and I'm going to apply it
> whithin a day or two.
> 
> A couple of comments: 
> 
>  (1)  In ChartAxis you are using the Double object instead of the primitive
> double.  Do you mind if we switch it to use the primitive type?
> 
>  (2) We compile POI with the Java language level 5.0 which means that @Override
> in interfaces isn't supported. Please remove.
> 
> Other than that, very cool! There is only a few tests for now, but I'm sure
> there will be more :)
> 
> Regards,
> Yegor
Comment 7 Yegor Kozlov 2011-05-18 13:27:11 UTC
Can your code return reasonable defaults, say, all zeros would mean that these values are not set? To me it looks better than using the Double type in interfaces. 

BTW, there is a bug in your code: you need to check if the Double argument is null.

For example, passing a null to the code below will cause NPE:

	@Override
	public void setMaximum(Double max) {
		CTScaling scaling = getCTScaling();
		if (scaling.isSetMax()) {
			scaling.getMax().setVal(max.doubleValue());
		} else {
			scaling.addNewMax().setVal(max.doubleValue());
		}
	}

Yegor
Comment 8 Roman Kashitsyn 2011-05-19 11:16:08 UTC
Created attachment 27033 [details]
Cumulative patch with applied remarks.

I have finally introduced uniform approach to charts. I have also replaced Double with double in ChartAxis.

In addition, I have  added new test cases, so the build produces correct ooxml-light version.
Comment 9 Yegor Kozlov 2011-05-20 08:23:30 UTC
Applied in r1125275 with some tweaks.

I see you added two methods to the Drawing interface, but forgot to implement them in HSSFPatriarch which implements Drawing. No problems - I added createAnchor and createChart to HSSFPatriarch, createChart throws NotImplemented for now. For the future, please always run the full Ant test cycle before creating a patch, i.e. "ant clean test".  

The ScatterChart example throws NPE.  I think the line 69 should call setYValues, not setXValues:

firstSerie.setYValues(sheet, new CellRangeAddress(1, 1, NUM_OF_COLUMNS + 1, NUM_OF_COLUMNS + 1));

Please confirm.

The generated output is readable, but differs from the atatched sample file. I see data rows and an empty graphic frame. Is it still in progress? 

formatAsString

I think a good test would be to create a test file with a chart in Excel and try to replicate that programmatically in POI. This way we can compare XML generated by POI with XML generated by Excel and ensure that all is good. What do you think? 

Regards,
Yegor
Comment 10 Roman Kashitsyn 2011-05-20 13:34:57 UTC
Created attachment 27039 [details]
Patch for ScatterChart example and formatAsString() method

Hi,

I have fixed the example with ScatterChart. There were also bug with formatAsString() method (the bug is reproducible when define sheet names with spaces). I have also decided to switch from git to subversion when working with poi, so new patch has been produced by svn.

Reverse engineering of existing MS Office documents is one of the most useful source of information for development. So I am fully agreed with your proposal. Unfortunately, there is a lot of work needed to force POI charts look similar to charts produced by MS Office. I know exactly where sources of differences are:
- ms office always uses themes;
- ms office always uses cache in chart series (I have introduced (dummy at the moment) option to turn on and off this feature: some people may want smaller documents);
- ms office generates strange ids for axis (definitely there is some algorithm... maybe hashing?).
I will turn my effort to decrease the difference between documents produced with POI and documents produced by MS Office.

It is also very useful to provide ability to parse existing charts: I will try to implement this feature in  near future.
Comment 11 Yegor Kozlov 2011-05-21 10:51:28 UTC
(In reply to comment #10)
> Created attachment 27039 [details]
> Patch for ScatterChart example and formatAsString() method
> 

applied in r1125662.

Yegor
Comment 12 Roman Kashitsyn 2011-05-27 20:57:41 UTC
Created attachment 27077 [details]
Some new features for charts.

Hi,

I have introduced some new features for charts:
+ full control of chart and legend position with manual layouts;
+ more easy series specification with DataMarker class;
There is also one little addition: now we can parse value axis of existing charts.

I am also working on cached values insertion for chart series (it will give more compatibility with MS Office).

Best regards,
Roman Kashitsyn
Comment 13 Yegor Kozlov 2011-06-06 10:29:33 UTC
Applied in r1132553

Thanks,
Yegor
 
> I have introduced some new features for charts:
> + full control of chart and legend position with manual layouts;
> + more easy series specification with DataMarker class;
> There is also one little addition: now we can parse value axis of existing
> charts.
> 
> I am also working on cached values insertion for chart series (it will give
> more compatibility with MS Office).
> 
> Best regards,
> Roman Kashitsyn
Comment 14 Roman Kashitsyn 2011-07-17 20:41:44 UTC
Created attachment 27294 [details]
Draft of numeric cache implementation.

Hi,

I have finally finished first draft of number cache implementation. It not so easy as I thought. I have added some utils:
1. org.apache.poi.ss.util.SheetBuilder - useful class to build sheets from 2D arrays. I think it can be extrimely useful for tests (there are several examples in test cases ;).
2. org.apache.poi.ss.util.cellwalk package now contains classes to easily traverse cell ranges.

I hope I have not reinvent the wheel here :]

The good news is that after number cache implementation MS Office does not offer to save document with programmatically created charts when you open and close the document (so, number cache really affects compatibility).

Plans:
======
I am planning to refactor my implementation of Series: I want to get rid of DataMarker class and introduce some kind of chart DataSource: I have found that MS Office allows you to create static charts (without any reference to cells), so DataMarker is something definitely wrong for serie composition. So I have to replace it with some kind of abstract DataSource and I am thinking about appropriate interface.

Best regards,
Roman Kashitsyn
Comment 15 Yegor Kozlov 2011-07-20 08:03:56 UTC
Applied in r1148642

Thanks,
Yegor

(In reply to comment #14)
> Created attachment 27294 [details]
> Draft of numeric cache implementation.
> 
> Hi,
> 
> I have finally finished first draft of number cache implementation. It not so
> easy as I thought. I have added some utils:
> 1. org.apache.poi.ss.util.SheetBuilder - useful class to build sheets from 2D
> arrays. I think it can be extrimely useful for tests (there are several
> examples in test cases ;).
> 2. org.apache.poi.ss.util.cellwalk package now contains classes to easily
> traverse cell ranges.
> 
> I hope I have not reinvent the wheel here :]
> 
> The good news is that after number cache implementation MS Office does not
> offer to save document with programmatically created charts when you open and
> close the document (so, number cache really affects compatibility).
> 
> Plans:
> ======
> I am planning to refactor my implementation of Series: I want to get rid of
> DataMarker class and introduce some kind of chart DataSource: I have found that
> MS Office allows you to create static charts (without any reference to cells),
> so DataMarker is something definitely wrong for serie composition. So I have to
> replace it with some kind of abstract DataSource and I am thinking about
> appropriate interface.
> 
> Best regards,
> Roman Kashitsyn
Comment 16 Roman Kashitsyn 2011-08-29 06:23:52 UTC
Created attachment 27445 [details]
patch with ChartDataSource strategy implementation

Hi,

I have finished the ChartDataSource idea implementation. It provides unified interface for data someone may wish to plot. From my point of view, API become more powerful and flexible.

There is several aspects:
1. I have reformatted my code so now it indented with only spaces, not with mess of tabs and spaces. It makes your review work more tedious, sorry for that :)
2. I have deleted some files I do not need anymore (DataMarker.java, for example), but I am in doubt regarding CellWalk API. I seems useful, but I don't need it since there are ChartDataSources.

Plans:
I will try to implement LineChart to investigate non-numeric ChartDataSources potential.

Best regards,
Roman Kashitsyn.
Comment 17 Yegor Kozlov 2011-09-10 19:04:15 UTC
Applied in 1167579, thanks, Roman

The Chart API is getting close to the point when it can be used in real applications, at least, for creating Scatter charts. The class design looks good, but to justify it we need more chart types implemented - until we have only Scatter chart it is hard to tell whether all interfaces fit together.

I got my main impression from the ScatterChart example. It worked, but some important features are missing:

 - legend name. In the ScatterChart example the data series are titled "Series1" and "Series2" but I didn't find a way to modify it.  Shouldn't the addSerie method accept series name as a third parameter?  For example,

chartData.addSerie(xValues, yValues, seriesName);

 - gridlines. Both X- and Y- grid lines are not visible and I couldn't change it.

It would be also nice to be able to format markers and lines. 

Other than that, very cool! Thanks a lot for your excellent contribution.

Regards,
Yegor