Bug 45612 - [Patch] Subproject Addressing and Simulation Mode
Summary: [Patch] Subproject Addressing and Simulation Mode
Status: NEW
Alias: None
Product: Ant
Classification: Unclassified
Component: Core (show other bugs)
Version: 1.8.2
Hardware: All All
: P2 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: Ant Notifications List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2008-08-11 02:03 UTC by Oran Fry
Modified: 2010-12-27 11:11 UTC (History)
0 users



Attachments
svn diff with http://svn.apache.org/repos/asf/ant/core/trunk @ 684142 (27.46 KB, patch)
2008-08-11 02:03 UTC, Oran Fry
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oran Fry 2008-08-11 02:03:17 UTC
Created attachment 22426 [details]
svn diff with http://svn.apache.org/repos/asf/ant/core/trunk @ 684142

Subproject Addressing and Simulation Mode are features I have added to Ant to help in my own work. They are well tested, and they don't affect ant unless they are turned on explicitly (explained below). I would love to see this patch applied to the trunk, and for these features to make it into the next release.

Subproject Addressing (turned on with the -addressing flag):
As you know, each time <ant> or <antcall> is executed, a new Project object is created and executed as a subproject. Subproject Addressing gives each of these subprojects an address. The first subproject gets the address 1, the second gets the address 2 and so on. If subproject 1 has subprojects of its own, they are given the addresses 1.1, 1.2 and so on. These addresses are printed in front of target names as the project runs, and output is indented according to the depth of the subproject in the target tree.

These addresses can be used to select which subprojects to run in a build. Subproject Addressing adds four new options to ant: -addressing (mentioned above), -from, -to and -descend. The -from option can be used to specify a subproject to start execution from, -to to specify a subproject to execute up to and then stop, and -descend to specify a single subproject to execute (including any subprojects of its own).

For ease of use, the -from, -to and -descend options turn Subproject Addressing on automatically. The keywords "root" and "infinity" are also valid addresses, with root specifying the main project and infinity specifying a first-level subproject of the main project with an infinitly high address. So you can run things like 'ant -from 2 -to infinity'. The specified subprojects need not actually exist; in a project with four subprojects, 'ant -to 100' would just execute the four subprojects and exit.

Simulation Mode (turned on with the -sim flag):
Simulates a build. Just prints out the names of the targets that would be executed in a real build, but without actually doing anything. This was implemented by simply modifying the Task class to do nothing unless the task name is either "import", "antcall" or "ant" as these tasks are responsible for the flow of execution, while other tasks are responsible for doing the actual "work". Note: though the "ant" task is executed in Simulation Mode, the subproject will not actually be loaded or executed, as the specified ant file (or files it imports) may not exist yet - earlier parts of the build may be responsible for creating these files.


These two features go well together. You can do ant -addressing -sim to get a visual tree of projects (actually, if each subproject has just one target, you can think of it as a tree of targets). Then, you can select addresses from that visual tree to specify in the -from, -to or -descend argument.

Demonstration:

Using this build.xml:
<project name="TestProject" default="main">

	<target name="main">
		<echo>code in main</echo>
		<antcall target="targetA"/>
		<antcall target="targetB"/>
		<antcall target="targetC"/>
		<antcall target="targetD"/>
		<echo>code in main</echo>
	</target>

	<target name="targetA">
		<echo>code in a</echo>
	</target>

	<target name="targetB">
		<echo>code in b</echo>
		<antcall target="targetBsub"/>
		<antcall target="targetBsub2"/>
		<echo>code in b</echo>
	</target>

	<target name="targetC">
		<echo>code in c</echo>
		<ant antfile="build2.xml" target="externalTarget1"/>
		<ant antfile="build2.xml" target="externalTarget2"/>
		<ant antfile="build2.xml">
			<target name="externalTarget1"/>
			<target name="externalTarget2"/>
		</ant>
	</target>

	<target name="targetD" depends="targetDdepend">
		<echo>code in d</echo>
	</target>

	<target name="targetBsub">
		<echo>code in b sub</echo>
		<antcall target="targetBsubsub"/>
		<echo>code in b sub</echo>
	</target>

	<target name="targetBsub2">
		<echo>code in b sub2</echo>
	</target>

	<target name="targetBsubsub">
		<echo>code in b sub sub</echo>
	</target>

	<target name="targetDdepend">
		<echo>code in d depend</echo>
	</target>

</project>


(blank lines omitted from output)

Run the build with no options (everything runs as normal):
$ ant

main:
	 [echo] code in main
targetA:
	 [echo] code in a
targetB:
	 [echo] code in b
targetBsub:
	 [echo] code in b sub
targetBsubsub:
	 [echo] code in b sub sub
	 [echo] code in b sub
targetBsub2:
	 [echo] code in b sub2
	 [echo] code in b
targetC:
	 [echo] code in c
externalTarget1:
	 [echo] code in an external target 1
externalTarget2:
	 [echo] code in an external target 2
externalTarget1:
	 [echo] code in an external target 1
externalTarget2:
	 [echo] code in an external target 2
targetDdepend:
	 [echo] code in d depend
targetD:
	 [echo] code in d
	 [echo] code in main

Simulate build:
$ ant -sim
main:
targetA:
targetB:
targetBsub:
targetBsubsub:
targetBsub2:
targetC:
externalTarget1 in build2.xml
externalTarget2 in build2.xml
externalTarget1,externalTarget2 in build2.xml
targetDdepend:
targetD:


Simulate build, with addressing:
$ ant -sim -addressing
main:
|   
|   1 targetA:
|   
|   2 targetB:
|   |   
|   |   2.1 targetBsub:
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   
|   |   2.2 targetBsub2:
|   
|   3 targetC:
|   |   
|   |   3.1 -> externalTarget1 in build2.xml
|   |   
|   |   3.2 -> externalTarget2 in build2.xml
|   |   
|   |   3.3 -> externalTarget1,externalTarget2 in build2.xml
|   
|   4 targetDdepend:
|   
|   4 targetD:

Note that targetDdepend and targetD get the same address. This is because they are part of the same subproject. This is how it should be, because we always need to be sure that dependencies have been run before the targets that depend on them; addressing does not allow the user to get out of running target dependencies.


Run the build with addressing
$ ant -addressing
main:
|	[echo] code in main
|   
|   1 targetA:
|   |		[echo] code in a
|   
|   2 targetB:
|   |		[echo] code in b
|   |   
|   |   2.1 targetBsub:
|   |   |		[echo] code in b sub
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   |   |		[echo] code in b sub sub
|   |   |	[echo] code in b sub
|   |   
|   |   2.2 targetBsub2:
|   |   |		[echo] code in b sub2
|   |		[echo] code in b
|   
|   3 targetC:
|   |		[echo] code in c
|   |   
|   |   3.1 externalTarget1:
|   |   |		[echo] code in an external target 1
|   |   
|   |   3.2 externalTarget2:
|   |   |		[echo] code in an external target 2
|   |   
|   |   3.3 externalTarget1:
|   |   |		[echo] code in an external target 1
|   |   
|   |   3.3 externalTarget2:
|   |   |		[echo] code in an external target 2
|   
|   4 targetDdepend:
|   |		[echo] code in d depend
|   
|   4 targetD:
|   |		[echo] code in d
|		[echo] code in main

Note that when an ant task has multiple nested target elements (in the case of 3.3) they are loaded into the same subproject so they get the same address. 


Simulate, starting from subproject 3 (note inclusive nature of -from):
$ ant -from 3 -sim
main:
|   
|   3 targetC:
|   |   
|   |   3.1 -> externalTarget1 in build2.xml
|   |   
|   |   3.2 -> externalTarget2 in build2.xml
|   |   
|   |   3.3 -> externalTarget1,externalTarget2 in build2.xml
|   
|   4 targetDdepend:
|   
|   4 targetD:

Simulate, executing up to subproject 3 (note exclusive nature of -to):
$ ant -to 3 -sim
main:
|   
|   1 targetA:
|   
|   2 targetB:
|   |   
|   |   2.1 targetBsub:
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   
|   |   2.2 targetBsub2

Simulate descending target 2:
$ ant -descend 2 -sim
main:
|   
|   2 targetB:
|   |   
|   |   2.1 targetBsub:
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   
|   |   2.2 targetBsub2:

Note that when starting from a subproject deep in the tree using -from or -descend, ancestors projects [and their code] are also executed - be careful:

$ ant -descend 2.1
main:
|		[echo] code in main
|   
|   2 targetB:
|   |		[echo] code in b
|   |   
|   |   2.1 targetBsub:
|   |   |		[echo] code in b sub
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   |   |		[echo] code in b sub sub
|   |   |		[echo] code in b sub
|   |		[echo] code in b
|		[echo] code in main

Specifying more than one target on the command line:
(for demonstration, I just specify main twice)

$ ant -sim -addressing main main
Buildfile: /media/disk/oran/sandbox/test-project/build.xml
main:
|   
|   1 targetA:
|   
|   2 targetB:
|   |   
|   |   2.1 targetBsub:
|   |   |   
|   |   |   2.1.1 targetBsubsub:
|   |   
|   |   2.2 targetBsub2:
|   
|   3 targetC:
|   |   
|   |   3.1 -> externalTarget1 in build2.xml
|   |   
|   |   3.2 -> externalTarget2 in build2.xml
|   |   
|   |   3.3 -> externalTarget1,externalTarget2 in build2.xml
|   
|   4 targetDdepend:
|   
|   4 targetD:
main:
|   
|   5 targetA:
|   
|   6 targetB:
|   |   
|   |   6.1 targetBsub:
|   |   |   
|   |   |   6.1.1 targetBsubsub:
|   |   
|   |   6.2 targetBsub2:
|   
|   7 targetC:
|   |   
|   |   7.1 -> externalTarget1 in build2.xml
|   |   
|   |   7.2 -> externalTarget2 in build2.xml
|   |   
|   |   7.3 -> externalTarget1,externalTarget2 in build2.xml
|   
|   8 targetDdepend:
|   
|   8 targetD:


Keyword addresses are good for being explicit, but are rarely needed:
'ant -addressing' is the same as 'ant -descend root -from 0 -to infinity'
'ant -from 2' is the same as 'ant -from 2 -descend root -to infinity'
'ant -to root' does nothing
'ant -from infinity' does nothing
'ant -descend infinity' does nothing

My first patch! Any questions, please mail me. Thank you to apache and everyone working on ant for a great tool.
Comment 1 Stefan Bodewig 2008-08-18 08:03:55 UTC
just stumbled over this report - if you assign a bug to yourself the other developers won't see the notifications and thus potentially never notice the report.  don't do this ;-)

I've seen you mail to the dev list and started reading it but repeatedly gave up because I failed to understand the rationale immediately (and ran out of time for deeper thoughts).  It's on my TODO list, though.
Comment 2 Matt Benson 2008-08-18 08:30:00 UTC
For my own POV, I looked at this when it first came through but forgot to comment.  I'm not sure I feel this feature is compatible at a theoretical level with Ant's target concept.  I would think it would be possible to implement this behavior independently with a custom target Executor, and would be in favor of any changes needed to permit the functioning of such a third-party implementation.
Comment 3 Oran Fry 2008-08-18 19:05:58 UTC
(In reply to comment #1)
>I've seen you mail to the dev list
(In reply to comment #2)
>I'm not sure this feature is compatible ... with Ant's target concept

Please do not go by the email I sent to the dev list - I have cleaned up a lot of the details since then. In the email I referred to the feature as "target addressing", but I have since I realized that "project addressing" is more appropriate, and more compatible with Ant concepts (and code).

(In reply to comment #1)
>I failed to understand the rationale immediately (and ran out of
>time for deeper thoughts)

Fair enough! It started as a way to resume a failed build. The first time I saw a chance to extend Ant was when I started working with Greenstone3. The build is done with Ant, with lots of use of antcall and it's a very long process. If the build failed partway through (say, because some environment variable was set incorrectly), you could fix the problem and rerun the target where the build failed, which was great. But the problem was, it would not then carry on with the rest of the build - all you could do was to start the build from the beginning again. There was inevitably some bandwidth, cpu, and time wastage rerunning the same targets again, despite Ant's ability to avoid repeat operations. The solution was to implement a system where you always specify the same high level target on the command line (an "entry point" target, something like "build-greenstone3"), but with the option of telling Ant to resume the build from the given subproject (i.e., antcall). Addressing was a natural way to tell Ant which subproject to resume from (the -from option), and once that was implemented it was worth it to add the complimentary -to and -descend options.
Comment 4 Dominique Devienne 2008-08-19 06:51:51 UTC
Well first, maybe it's the heavy use of <antcall> that's the performance problem! (had to put that one in ;-)

OK, the use case you describe makes sense, although I'm not sure your approach is a pragmatic one. From my experience with large, multi-sub-project builds, restarting the build does indeed take a little time, but a well design build that does nothing when nothing needs doing quickly gets up to the end. My largest build ever with close to 100 different sub-builds of native C/C++/Fortran code took 30sec to recurse in every lib and exe and jar when everything was up-to-date. So the complex feature you are suggesting wouldn't buy you much, since once the build is designed, it rarely fails in the middle for nothing.

Add validation targets to your subbuilds, and recurse on those in a first pass to find all potential issue quickly, then do the build per se on a second pass.

To come back to your idea, you describe a "resumable" mode for Ant. The way I see it, the user would explicitly request Ant be in the "resumable" mode, and if something fails, you'd do "ant -resume" to restart it from the failure point. But that would imply tracking which targets ran so far, which one failed, both in which nested builds, etc... Plus what the properties where, and worse, what the reference datatypes were, which implies serializing the latter. So it's very difficult to code up. Specifying from which target to resume explicitly would be very brittle for the same property/reference issue.

There's simply not enough bang for the buck here IMHO. --DD
Comment 5 Oran Fry 2008-09-07 17:05:58 UTC
The basic idea of the Addressing feature is that it allows a user to see the project tree, and make use of its structure. It is an general tool which could have many uses, and users may find new ways to use it that even I haven't thought of. It's not just about resuming a build; that is only one way it can help.

Here are some of the uses I have put addressing to:
* Viewing the structure of an unfamiliar project (e.g., one downloaded from the internet)
* Resuming a failed build that is not well designed to avoid repeat operations
* Resuming even well designed builds which inevitably take time to skip over operations which are already done (especially when the checks involve checking files on the internet)
* Stepping through a build, to examine the state of files etc. as the build progresses
* Executing the first part of a build in preparation to execute the second part later (E.g., If I'm waiting for the files needed for the second part)
* Skipping over an unstable or problematic part of a build without having to modify the build file. (e.g., If an build is failing at a particular target but you are able to perform that operation manually.)
* Expressing the relationship between targets. Addressing lets the user see which targets are being called as dependencies and which are being called as subprojects. If "do-b" and "do-c" are subprojects of "do-a", this expresses that "b" and "c" they are *part of* the process "a". If "do-b" and "do-c" are dependencies of "do-a", this expresses that "b" and "c" are things which must be done *before* process "a". For a user, seeing these relationships makes a build easier to understand.
* Avoiding repeat execution of "init" targets. Rather than making every target depend on "init", I just make the entry point target(s) depend on "init". Since the build is always entered from the entry point target, the init target will always be executed when the build is invoked. And it will only ever be executed once, unlike in some projects where it is executed multiple times as a result of the absence of backwards propagation in ant. It also saves you putting 'depends="init"' on every target.
* Checking for links in an antfile to other antfiles (i.e., use of the <ant> task. Done by using ant -sim -addressing.)
* Addressing works for any ant file, whether the developers are aware of addressing or not. But if the developers are aware of addressing, they can design their builds to make special use of it and so get the most out of it.

I have found the addressing option (in conjunction with the sim option) a pleasure to use as I have developed and used the nightly release snapshot system for Greenstone. It is actually very simple as far as the code changes are concerned - I have added just two if blocks to check whether a call to a task should proceed in light of the addressing and sim options specified on the command line, and the rest is just the code to pass those options around, some display stuff, and a class to represent an address. It has many uses, and I have had good feedback about it. So I think there is a lot of bang for your buck!