38598 – SubSet Filter

Bug 38598 - SubSet Filter

Summary: SubSet Filter

Status:	NEW

Alias:	None

Product:	Ant
Classification:	Unclassified
Component:	Core tasks (show other bugs)
Version:	1.6.5
Hardware:	All All

Importance:	P2 enhancement (vote)
Target Milestone:	---
Assignee:	Ant Notifications List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-02-09 23:59 UTC by Donovan Dillon
Modified:	2008-11-24 03:58 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Donovan Dillon 2006-02-09 23:59:23 UTC

When processing text files, using the copy, move, concat (etc) has the current 
ability to use filter readers. The current set of filter readers provides a 
very useful set of functionality. The enhancement I would like would be a 
filter that provide the following functionality.

The SubSet Filter is a filter that can be used within a filterchain to extract 
data from a text file. It allows you to designate 2 reg exp patterns. The 
first pattern is what the filter uses to start the extraction and the second 
is the pattern is uses to stop the extraction.

Rules:
1) If the beginning filter is not specified, starts at beginning

2) If the beginning filter is never found, no lines are returned
 
3) If the end filter is never found or not specified, then all lines till the 
end are returned
 
4) If the skipstart attribute is set, it will skip N number of matches before 
it starts
 
5) If the skipend attribute is set, it will skip N number of matches before it 
ends

After the lines are determined, each line can then be limited by column index. 
The truncating of a line will keep the line-ending character.

Rules:
 
1) If columnstart index is specified, the entire line is returned starting 
from that 0-based index
 
2) If columnstart is greater than a the line length, nothing is returned
 
3) If columnend is specified, only text up to that index is returned.
 
4) If columnend is greater than line length, then everything up to line length 
is returned.

Now obviously I have already tried to do this and think it is a useful filter 
that would help complete the existing set of great filters. I have had a 
recent set of tasks and chose ant to help do text file processing and found 
that this was very useful. I realize you already have a great pool of talent 
but would look forward to contributing the code I do have. It is fully unit 
tested using the anttest util. 

Anyways, hopefully I will see it in a future release.

Thanks,
Donovan

Comment 1 Matt Benson 2006-02-10 00:04:13 UTC

This sounds useful.  I personally would rather NOT see functionality duplicated;
i.e. you can select by column using existing regex stuff chained after the basic
functionality you have outlined here...

Comment 2 Dominique Devienne 2006-02-10 17:33:12 UTC

Rather than introducing new filters, I think the existing <head> and <tail> 
filters could be extended to take an additional regex. The use case you 
describe matches head/tail IMHO, and simply extends the concept to have a 
regex rather than a simple line count to determine the start/end.

I'm wouldn't be against adding a new filter to 'cut' lines by column numbers. 
Although it's certainly possible to achieve using regexps, it would likely be 
easier to use a cut-like column specific filter ;-) --DD

Comment 3 Matt Benson 2006-02-10 17:47:22 UTC

okay, if we want to go down this road... ;) I would say that a cut filter should
implement cut fully, including fields and delims... but either way,
modularization is good.