Bug 28681 - [PATCH] Xml output for the SQL task
Summary: [PATCH] Xml output for the SQL task
Status: NEW
Alias: None
Product: Ant
Classification: Unclassified
Component: Core tasks (show other bugs)
Version: 1.7.0
Hardware: Other other
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Ant Notifications List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2004-04-29 10:55 UTC by Nicola Ken Barozzi
Modified: 2009-07-31 04:21 UTC (History)
1 user (show)



Attachments
SQLExec.java.diff (5.96 KB, patch)
2004-04-29 10:56 UTC, Nicola Ken Barozzi
Details | Diff
SQLExec.java.diff with grouping (9.84 KB, patch)
2004-05-04 15:26 UTC, Nicola Ken Barozzi
Details | Diff
SQLExec.java_and_StringUtils.java.diff (11.55 KB, patch)
2004-05-05 09:15 UTC, Nicola Ken Barozzi
Details | Diff
SQL-extractor-patch.txt (16.11 KB, patch)
2004-05-25 15:12 UTC, Nicola Ken Barozzi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nicola Ken Barozzi 2004-04-29 10:55:52 UTC
The SQL task currently outputs only csv, but I need to have XML output, so I can
xslt it and send it by mail as part of a cron job.

Here is a the patch of a quick hack to do it, by setting outputFormat="xml".
Maybe it could be easier to do separator="xml", but it mixes deinitions.

There are no test cases and no documentation, I am just posting it here in case
someone else wants to use it or cares to prepare it for inclusion in Ant.
Comment 1 Nicola Ken Barozzi 2004-04-29 10:56:25 UTC
Created attachment 11372 [details]
SQLExec.java.diff
Comment 2 Nicola Ken Barozzi 2004-05-04 15:25:28 UTC
Added a nesting option, where the tags are nested to the level specified.

Example:

  <sql...
    showheaders="false"
    outputFormat="xml"
    >
      ${data.sql}
    </sql>

gives

<sqlresults>
    <row><cell header="A">one</cell><cell header="B">1</cell></row>
    <row><cell header="A">one</cell><cell header="B">11</cell></row>
    <row><cell header="A">two</cell><cell header="B">2</cell></row>
</sqlresults>


while

  <sql...
    showheaders="false"
    outputFormat="xml"
    nestLevel="1"
    >
      ${data.sql}
    </sql>

gives

<sqlresults>
  <nest group-by="A" value="one">
    <row><cell header="B">1</cell></row>
    <row><cell header="B">11</cell></row>
  </nest>
  <nest group-by="A" value="two">
    <row><cell header="B">2</cell></row>
  </nest>
</sqlresults>

To keep memory low and good speed it deals with the values as they arrive.

So if this is nested;

<sqlresults>
    <row><cell header="A">one</cell><cell header="B">1</cell></row>
    <row><cell header="A">two</cell><cell header="B">2</cell></row>
    <row><cell header="A">one</cell><cell header="B">11</cell></row>
</sqlresults>

it gives this:

<sqlresults>
  <nest group-by="A" value="one">
    <row><cell header="B">1</cell></row>
  </nest>
  <nest group-by="A" value="two">
    <row><cell header="B">2</cell></row>
  </nest>
  <nest group-by="A" value="one">
    <row><cell header="B">11</cell></row>
  </nest>
</sqlresults>

Just remember to make the query so that nesting values come first, and include
them in an "ORDER BY" clause in the same order.

IE:

for a nestingLevel=1

SELECT A,B,C
FROM T
ORDER BY A

for a nestingLevel=2

SELECT A,B,C
FROM T
ORDER BY A,B

etc...

NOTE: the patch supercedes and includes the previous ones
Comment 3 Nicola Ken Barozzi 2004-05-04 15:26:13 UTC
Created attachment 11425 [details]
SQLExec.java.diff with grouping
Comment 4 Jose Alberto Fernandez 2004-05-04 18:29:23 UTC
Although I have no ideological problem on generating XML from queries
using the this task, I do have a problem on inventing a new XML dialect
with no documentation (DTD nor XMLSchema) to express this output.

If we want to do do this, we should use a simple well known XML for
queries and be done with it. If people want something more rich they can
use XSLT on top of that.
Comment 5 Nicola Ken Barozzi 2004-05-05 09:14:20 UTC
> I do have a problem on inventing a new XML dialect
> with no documentation (DTD nor XMLSchema) to express this output.

I've added a DTD inline with the output (also in the new patch).
Since I was at it, in the patch I also moved the method to escape xml in
StringUtils.

<?xml version="1.0" ?>
<!DOCTYPE sqlresults [
  <!ELEMENT sqlresults (headers?, (nest | row)* )>
  <!ELEMENT headers (header)*>
  <!ELEMENT header (#PCDATA)>
  <!ELEMENT nest (row)*>
  <!ATTLIST nest group-by CDATA #REQUIRED
                 value    CDATA #REQUIRED>
  <!ELEMENT row (cell)*>
  <!ELEMENT cell (#PCDATA)>
  <!ATTLIST cell header CDATA #REQUIRED>
]>

> If we want to do do this, we should use a simple well known XML for
> queries and be done with it. 

It's not the query, it's the result in xml... and for that I don't know about
well-known schemas. If someone can point me to a "standard" I'd be happy to
change the element names to follow it.

> If people want something more rich they can
> use XSLT on top of that.

That's exactly what it does, it uses a fixed simple DTD. I had to include
grouping though as doing it by xsl is complicated and resource intensive. The
grouping this task does is as simple as can be.

Maybe I'm missing something...
Comment 6 Nicola Ken Barozzi 2004-05-05 09:15:07 UTC
Created attachment 11433 [details]
SQLExec.java_and_StringUtils.java.diff
Comment 7 Jose Alberto Fernandez 2004-05-05 13:06:07 UTC
To me, we should just map ResultSet object into XML.
ResultSets are an iterative version of a table, and we should just
be able to express that in XML.

I do not understand why we need to do grouping here, and how do you
know the grouping is on "A" on your example. Are you parsing the SQL
or is this available as metadata from the resultset.

I do not like all this nested things to be assummed just be the position of 
the column or something some heuristic that will not work for the next guy 
that tries to use the feature.


Comment 8 Nicola Ken Barozzi 2004-05-05 15:40:49 UTC
> To me, we should just map ResultSet object into XML.
> ResultSets are an iterative version of a table, and we should just
> be able to express that in XML.

This seems quite similar to a resultset in xml to me:

<sqlresults>
    <row><cell header="A">one</cell><cell header="B">1</cell></row>
    <row><cell header="A">two</cell><cell header="B">2</cell></row>
    <row><cell header="A">one</cell><cell header="B">11</cell></row>
</sqlresults>

> I do not understand why we need to do grouping here,

Then tell me how you would do it instead.

> and how do you
> know the grouping is on "A" on your example. Are you parsing the SQL
> or is this available as metadata from the resultset.

It's the first column name, I get it from the metadata (@see the code).

> I do not like all this nested things to be assummed just be the position of 
> the column or something some heuristic that will not work for the next guy 
> that tries to use the feature.

The nesting follows the column numbers. A nesting level of 1 groups by the first
column. It seems easy enough to understand, as it's what Excel or any other
spreadsheet does when you ask for grouping. 

How would you do it instead?
Comment 9 Jose Alberto Fernandez 2004-05-05 17:14:42 UTC
That is exactly my point, I wouldn't do it (the nesting).

We keep on adding more and more things to the <sql> task
and creating a monolithic monster (not you but over time).

Maybe we should move any (nw query functionality) to a separate
task e.g., <sqlquery/> defined to extract data from the DB
and leave <sql> mostly for ddl,dml operations.

Then we probably can have a better definition for it.
Comment 10 Nicola Ken Barozzi 2004-05-06 07:04:24 UTC
> That is exactly my point, I wouldn't do it (the nesting).

But I need it, and I have to do it somewhere, and as I try to explain, it's best
to do it in the data extraction.

:-) 

> We keep on adding more and more things to the <sql> task
> and creating a monolithic monster (not you but over time).

The fact is that here it's the easiest place to put things.
What I could do is to make the SQLExtractor interface, and have the SQL task use
that if defined. In this way we would keep the sql stuff in the same place but
have pluggable extractors (maybe even a velocity extractor for templating).

<sql blah blah>
 <extractor class="org.blik.MyExtractor">
   <param name="nesting" value="2"/>
 </extractor>
</sql>

> Maybe we should move any (nw query functionality) to a separate
> task e.g., <sqlquery/> defined to extract data from the DB
> and leave <sql> mostly for ddl,dml operations.

I think this would confuse users, and for it I woul dhave to replicate code from
the sql task or do some refactoring.

> Then we probably can have a better definition for it.

As?

Sorry, but I don't have time to think about this too much. If you can give me a
more detailed explanation of how you would like to see this, I could find some
time to do it, or else I'll jsut factor out the extractor.
Comment 11 Jose Alberto Fernandez 2004-05-06 15:34:51 UTC
Just to make my point more clear on why I think we are going in the wrong 
direccion:

In your current proposal, your nested XML output requires a certain order for 
the query, in order for nesting to be processed correctly. So what does happens
if I have several queries in my <sql/>? Do they all are required to look the 
same?

You see, with csv or with non-nested XML there is no assumption about the 
shape of the query. Hence it does not matter if you are executing one query or 
100 queries. But if now we want to be able to process the output
(and since SQL can execute multiple queries in the same transaction, etc.), we 
would need to provide the formatting rules (or the extractor) in a query by 
query basis.

When you look at all this issues, then you see that it is not that simple.
 - You probably need to be able to apply your extractor in a query by query
basis. 

 - If you want to be able to write generic extractors, (e.g., velocity)
then you need to be able to specify the velocity script which may be complex.

 - If you want a scripted extractor, you may need to be able to include the 
text of the script.

Something like:

 <sql ....>
   <transaction>
     <extractor class="foo.my.Extractor">
        <formatter>
           
        </formatter>
        <query>
          select a, b from c, d where c.a = d.b
        </query>
     </extractor>
   </transaction>
 </sql>

Now when you see all what is needed to do it properly, then it makes much more 
sence to use a separate task for all this.
Comment 12 Jack J. Woehr 2004-05-06 18:10:52 UTC
Excuse me for interjecting a comment. As one of the other Sql task "activists"
here, I agree with Jose Alberto Fernandez. I would rather see a new task than
see task Sql have its code fancied up with features. At present, the
SQLExec.java code implementing critical functionality relevant to the core
mission of Ant is a readable piece of code. Core tasks should be allowed to
persevere in such a state to the greatest extent possible.

Better to create a new task and only use well-characterized project-wide
patterns to hook SQLExec.java if necessary to nest the new result set formatting
task. If it does not require hooks, then why not submit the formatter to
Ant-Contrib where it can comforatbly enter the tree in an underdesigned
first-cut state and evolve as one works out the requirements more fully?

Comment 13 Nicola Ken Barozzi 2004-05-07 08:35:34 UTC
Thanks for your replies guys.

I'm sorry I probably seemed as being pissed off (and in fact I was a little ;-),
but I did not fully understand what you meaned in your comments.

> In your current proposal, your nested XML output requires a certain order for 
> the query, in order for nesting to be processed correctly. So what does 
> happens if I have several queries in my <sql/>? Do they all are required to
> look the same?

Actually a need a certain order for the *output* of the query, not the query
itself. I mean, I don't care how you do the query, but the query gives a table
as a result, right? So I nest based on the first, second, third, etc columns of
the resulting table, just like pivoting works in a spreadsheet.

I thought that this would simplify it for the user (as it does for me), but
probably it's better to have the user specify the column names to group by.

>  - If you want to be able to write generic extractors, (e.g., velocity)
> then you need to be able to specify the velocity script which may be complex.

I'd only add it as a file, and besides, it's part of the extractor code, not the
sql one.

>  - If you want a scripted extractor, you may need to be able to include the 
> text of the script.

? Can't I jsut reference a file in the extractor part?

> Something like:

>  <sql ....>
>   <transaction>
>     <extractor class="foo.my.Extractor">
>        <formatter>
>           
>        </formatter>
>        <query>
>          select a, b from c, d where c.a = d.b
>        </query>
>     </extractor>
>   </transaction>
> </sql>
>
> Now when you see all what is needed to do it properly, then it makes much more 
> sence to use a separate task for all this.

I'm still unconvinced, as the sql task already outputs CSV; but before
commenting further I'll try to add the extractor to the SQL task and see how the
code is, taking transactions into account. With that done, it will be easier for
all to actually see it and decide.

> Better to create a new task and only use well-characterized project-wide
> patterns to hook SQLExec.java if necessary to nest the new result set 
> formatting task. 

That's the idea, probably I was not clear. I mean that I want to separate the
<extractor> part outside of the <sql> taks, leaving just hooks that keep all the
sql stuff in the sql task.

> If it does not require hooks, then why not submit the formatter to
> Ant-Contrib where it can comforatbly enter the tree in an underdesigned
> first-cut state and evolve as one works out the requirements more fully?

I would happily do it, but to be able to format while the sql task is running I
need hooks.

Thanks guys for keeping the replies coming, I'll see what I can do about this. :-)

Comment 14 Nicola Ken Barozzi 2004-05-25 15:12:17 UTC
I've refactored this stuff to use an external "extractor". 
This is how it works with the current patch:

  <taskdef name="extractorxml" 
      classname="org.apache.tools.ant.taskdefs.SQLExtractorXML"/>   

  <extractorxml id="myextractorxml"/> 
  
    <sql driver="..."
         url="..."
         userid="..."
         password="..."
         print="yes"
         output="${work.dir}/outputfile.xml"
         extractorRef="myextractorxml"
    >
      ${data.sql}
    </sql>

If I want to use nesting for the xml output, I have to tell the extractor, not
the sql task, thus effectively separating the chores:

  <extractorxml id="myextractorxml" nestLevel="2"/>

I have not refactored out the CSV extraction to maintain compatibility with
previous versions, so in practice only extra output extractors need to be
defined this way.

WDYT?
Comment 15 Nicola Ken Barozzi 2004-05-25 15:12:45 UTC
Created attachment 11661 [details]
SQL-extractor-patch.txt
Comment 16 Nicola Ken Barozzi 2008-03-28 09:12:12 UTC
Waiting for the perfect solution almost 4 years have passed... maybe getting things done is better than getting them right.
Comment 17 Peter Reilly 2008-03-28 09:22:27 UTC
The problem is that the sql task is so full of bugs that
any serious sql user would not use it
and it has no unit tests and some strange code for
tokenizing sql lines that no ant developer would want to touch it.

Comment 18 Nicola Ken Barozzi 2008-03-28 09:33:31 UTC
;-)

I can easily put it somewhere else on the web, non problem. 
Shall we mark it WONTFIX?