Bug 60562 - bootstrap is broken on Solaris 10
Summary: bootstrap is broken on Solaris 10
Status: RESOLVED FIXED
Alias: None
Product: Ant
Classification: Unclassified
Component: Core (show other bugs)
Version: 1.10.0
Hardware: Sun Solaris
: P2 major (vote)
Target Milestone: 1.10.1
Assignee: Ant Notifications List
URL:
Keywords:
: 60673 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-01-09 14:32 UTC by Thomas Stecher
Modified: 2017-02-02 10:02 UTC (History)
1 user (show)



Attachments
ant-script running on solaris 10 (10.46 KB, text/plain)
2017-01-10 08:21 UTC, Thomas Stecher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Stecher 2017-01-09 14:32:15 UTC
Since 1.9.7 it's impossible to run ant on Solaris 10 out of the box when really using the original BourneShell (sh) in the Shebang-lLine.


> ./apache-ant-1.10.0/bin/ant -h
./apache-ant-1.10.0/bin/ant: syntax error at line 66: `)' unexpected

Replacing it by bash works fine.

The problem still seems to be in quoting command-line arguments by sed/awk/nawk.
Comment 1 Stefan Bodewig 2017-01-09 14:44:26 UTC
We may have overlooked bootstrap.sh suffering from the same problems as build.sh - see bug 59898 

Does build.sh work for you (if you installed a binary built on a different platform)?
Comment 2 Stefan Bodewig 2017-01-09 14:45:09 UTC
(In reply to Stefan Bodewig from comment #1)
> We may have overlooked bootstrap.sh suffering from the same problems as
> build.sh - see bug 59898 
> 
> Does build.sh work for you (if you installed a binary built on a different
> platform)?

I meant the ant wrapper script, not build.sh, sorry,
Comment 3 Thomas Stecher 2017-01-09 14:53:16 UTC
Now I've unpacked the binary distibution of 1.10.0 on OpenSuSE 13.1 and calling "ant -h" works fine.
On Solaris 10 it breaks.
On SuSE Linux Enterprise 12.1 it works also fine.

Does this information help?
Comment 4 Stefan Bodewig 2017-01-09 15:01:07 UTC
In a way it does, and to be honest it is pretty frustrating.

We know the wrapper script has been broken around 1.9.6 and we thought it had been fixed with 1.9.8/1.10.0, we even had people using Solaris confirm it was fixed.

The major problem for us is that the developers haven't got access to a Solaris box and we always need to ask whether something works - the last time around this led to people stopping to provide feedback and left us with an incomplete solution.

OK, enough whining from my side.

If line 66 is what breaks the script on Solaris, this is inside a case for "awk" which should never get called anyway (esc_tool should always be sed). Does the script work on Solaris if you comment out lines 62 to 67 inside the  ant script?
Comment 5 Thomas Stecher 2017-01-09 15:07:29 UTC
No it doesn't help. When calling it with "sh -v ant" I get this Output:

    case "$esc_tool" in
      'sed')
        # Mac bsd_sed does not support group-0, so pattern uses group-1
        # Solaris sed only proceses lines with trailing newline, passing in an extra newline
        # subshell assignment will trim the added trailing newline
        esc_arg="$(printf '%s\n' "$esc_arg" | sed -e 's@\([$"\\`]\)@\\\1@g')"
        ;;
#     'awk')
#       esc_arg="$(printf '%s' "$esc_arg" | "$awk_exec" '{ gsub(/\\/, "\\\\"); print }' )"
#       esc_arg="$(printf '%s' "$esc_arg" | "$awk_exec" '{ gsub(/\$/, "\\$");  print }' )"
#       esc_arg="$(printf '%s' "$esc_arg" | "$awk_exec" '{ gsub(/\"/, "\\\""); print }' )"
#       esc_arg="$(printf '%s' "$esc_arg" | "$awk_exec" '{ gsub(/`/,  "\\`")./ant: syntax error at line 66: `)' unexpected
Comment 6 Stefan Bodewig 2017-01-09 15:17:27 UTC
so it trips over a closing paren in a commented-out line?

so maybe we need to remove the block rather then comment it?
Comment 7 Thomas Stecher 2017-01-09 15:35:38 UTC
A first try to remove line #63 to 79 doesn't fix it. It seems to me the real bug it some lines earlier and the case statement doesn't get recognized as such.
Comment 8 Stefan Bodewig 2017-01-09 15:47:25 UTC
After we've decided to go with sed (because it seemed we got it to work on Solaris as well) the whole case has become obsolete. Could you trying to replace thw whole case with just

        esc_arg="$(printf '%s\n' "$esc_arg" | sed -e 's@\([$"\\`]\)@\\\1@g')"

?
Comment 9 Thomas Stecher 2017-01-09 16:11:38 UTC
So now we fixed the "case". Maybe it was a matter of bytes inside the for-loop?!?

But have a new one.

if $mingw ; then
  [ -n "$ANT_HOME" ] &&
    ANT_HOME="`(./apache-ant-1.10.0/bin/ant: syntax error at line 159: `(' unexpected
Comment 10 Stefan Bodewig 2017-01-09 17:37:43 UTC
that's bee bug 46936 https://github.com/apache/ant/commit/9aac905f3e768d42cd9cd4660f5be458ee990660 - I'll look into it

I've removed the case in Ant's master branch and will likely ask you to re-try further iterations. I'm sorry this has to be so difficult.
Comment 11 Stefan Bodewig 2017-01-09 17:50:59 UTC
I've replace the backticks completely and avoided the subshell in the mingw case, could you please try again with the master version?

I'll ping the original author of bug 46936 and ask some cygwin users to double check my changes.
Comment 12 Thomas Stecher 2017-01-09 17:59:06 UTC
The Problem seems to be your suggested line which wants to execute a command by $( command ). The bourne Shell on Solaris doesn't support that.

We have to replace it by something using `command` but must watch out for the back-tick in the sed statement.

When we get a running fix I also can test it on cygwin.
Comment 13 Thomas Stecher 2017-01-09 18:07:26 UTC
I think this is a good try:


    esc_arg=`printf '%s\n' "${arg}" | sed -e 's@\([$"\\\`]\)@\\\&@g'`

    quoted_arg="\"$esc_arg\""
Comment 14 Stefan Bodewig 2017-01-10 06:02:12 UTC
The sed line is the result of several iterations and I'm really scared of the thought that we'd touch it again.

In my change I removed all backtick invocations as you didn't report problems with the other cases of $(). In fact we've had $() sequences all over the script even before the platform problems started. $() is POSIX, is it really not supported by sh on Solaris?
Comment 15 Thomas Stecher 2017-01-10 07:27:47 UTC
Here is a small test on a Solaris 10 machine...

stecher@b2sn13:~/tmp > cat dollar.sh
#!/bin/sh

   value=$( pwd )

   echo ${value}
stecher@b2sn13:~/tmp > dollar.sh
./dollar.sh: syntax error at line 3: `value=$' unexpected

The sed statment is really hard to understand. I'll test it today on other systems.

There are two other places using the invocation of $() which should also be changed. Sorry I didn't see these earlier.
Comment 16 Thomas Stecher 2017-01-10 08:21:41 UTC
Created attachment 34602 [details]
ant-script running on solaris 10

Here is an updated version of the ant script. I've tested it by calling 

./ant -debug -v '$ ` \\'

with "ant_exec_debug=false" set in line 22 to see if arguments are well processed. 

It runs on Solaris 10, cygwin, SLES 12.1 and OpenSuSE 13.1

But to be sure some other people also should verify the correctness.
Comment 17 Thomas Stecher 2017-01-10 08:22:58 UTC
"ant_exec_debug=true" was set to verify...
Comment 18 Stefan Bodewig 2017-01-10 16:45:02 UTC
The version contained in 1.10.0 is the result of https://github.com/apache/ant/pull/25 which was measured against the invocation

ant --execdebug \
  "-Dfoo=dollar\$_backtick\`_single'_double\"\"_trailingbackslash\\" \
  "-Dbar=trailingnewline
" \
  "-Dnewline=
" \
  "-Ddoublespace=  " \
  "-Dx=y" \
  "-f" "test.xml"

(see the test.zip linked in the PR)

At least on Linux this seems to work as well.
Comment 19 Stefan Bodewig 2017-01-10 16:47:46 UTC
(In reply to Stefan Bodewig from comment #18)

> At least on Linux this seems to work as well.

No, it doesn't :-(

Your script strips trailing newlines. Where the original script ends with

run:
     [echo] hello world
     [echo] foo=dollar$_backtick`_single'_double""_trailingbackslash\_
     [echo] bar=trailingnewline
     [echo] _
     [echo] newline=
     [echo] _
     [echo] doublespace=  _
     [echo] end

using the attached script I get

run:
     [echo] hello world
     [echo] foo=dollar$_backtick`_single'_double""_trailingbackslash\_
     [echo] bar=trailingnewline_
     [echo] newline=_
     [echo] doublespace=  _
     [echo] end
Comment 20 jwadamson 2017-01-10 20:20:42 UTC
The behavior error in the snippet leading to truncation is that Posix has subshells trim output returned to an assignment statement. Probably so that one can use things like echo to set variables more conveniently.

The value returned by the subshell expression (`...`) therefore needs non-whitespace padding on its output.

Using something like: `printf 'X%sX\n' "${arg}"...`
and then stripping the padding X in the main shell should resolve that quirk
    esc_arg="${esc_arg#X}"
    esc_arg="${esc_arg%X}"

But I can't test this and can't see how it is different (by spec) than the original 1.10 solution unless the core of the problem is $() vs `` for invoking subshells.
Comment 21 Thomas Stecher 2017-01-11 15:39:33 UTC
Sorry for answering so late. Busy by business :-)

Now I tried the test.zip an solaris and I see what you mean. Unfortunately the replacement by 

    esc_arg="${esc_arg#X}"
    esc_arg="${esc_arg%X}"

is also unsupported by /bin/sh. I'll try to find a way around that tomorrow or on friday.
Comment 22 jwadamson 2017-01-12 20:49:14 UTC
That stinks. :-(

Solaris 10 was released in 2005, so that is sad it doesn't fit the 2004 (or earlier) POSIX spec but seems to be the case. http://unix.stackexchange.com/a/164242 "/bin/sh is not the POSIX shell but the legacy Bourne shell which predates POSIX and is then missing features that came later with the standard." 

Solaris 11 /bin/sh -> ksh83 (and is POSIX compliant)



Possibilities include:
1) identifying if the script is running in the solaris 10 shell and skip the padding/stripping portions. This sacrifices trailing whitespace, but shouldn't be any worse than the original ant script on this platform.
2) find a different syntax to do the job. I don't see operators for getting the length of a string or sub-stringing it, which would have been my second choice. Any use of sub-shell to process the value is going to strip whitespace and put us back to #1.
Comment 23 jwadamson 2017-01-12 22:32:38 UTC
Would prefer something specific to detect bourne-shell but I can't see any environment variables that would let us know if the ant script was running under heirloom-bourne. :-(

Worst case we can test via uname skip the portions that do the adding/stripping of arg padding?

----------------------
BOURNE_SHELL=false
OS=`uname -s`
REL=`uname -r`
if [ "$OS" = SunOS -a "$REL" = "5.10" ]
then
    BOURNE_SHELL=true
fi

if [ $BOURNE_SHELL ]
then
    esc_arg="$arg"
else
    esc_arg="X${arg}X";
fi

# TODO do sed replacement statement on esc_arg, in bourne-shell this will wind up trimming whitespace due to subshell use.

if [ $BOURNE_SHELL != "true" ]
then
    esc_arg="${esc_arg#X}"
    esc_arg="${esc_arg%X}"
fi
Comment 24 Stefan Bodewig 2017-01-13 05:58:58 UTC
(In reply to jwadamson from comment #22)
> That stinks. :-(
> 
> Solaris 10 was released in 2005, so that is sad it doesn't fit the 2004 (or
> earlier) POSIX spec but seems to be the case.
> http://unix.stackexchange.com/a/164242 "/bin/sh is not the POSIX shell but
> the legacy Bourne shell which predates POSIX and is then missing features
> that came later with the standard." 
> 
> Solaris 11 /bin/sh -> ksh83 (and is POSIX compliant)

This explains why we got reports the new wrapper script would work on Solaris.

Do we need to consider Solaris versions prior to 10? If so if we can't detect the incompatible /bin/sh can we detect the compatible one (maybe combined with detecting Solaris)?
Comment 25 jwadamson 2017-01-13 14:13:53 UTC
Downloaded a solarix 10 ova vm file.
Interesting results
1) seems like the `VAR=$( echo HI )` syntax does not work and throws an error
2) `VAR="$( echo HI )"` syntax actually does not execute the subshell at all and assigns the expression as a string to VAR
3) 
    X=" HI "
    echo "\$X=$X."
    Y="`echo "$X"`"
    echo ".${Y}."
shows that the shell does NOT trim the output from the subshell when assigning to Y.

So the best idea may be to detect the lack of trimming subshell capture, and in that case skip the padding+stripping operations since they are unnecessary.

Will see if I can confirm this and mock up the old ant torture test to see if it still works on linux and solaris 10.
Comment 26 jwadamson 2017-01-13 14:57:04 UTC
or maybe i am doing something wrong now.
Comment 27 Thomas Stecher 2017-01-13 15:06:21 UTC
I'll try to face the Problem of the newline inside of awk which will run after my last sed Suggestion because Solaris 10's sh doesn't support the Substitution of shell variables. I still think this will be possible.

The other way is to define the script to be BASH :-)

Still trying to find a way. I'll let you know what comes out at the end.
Comment 28 jwadamson 2017-01-13 15:31:30 UTC
What I was doing wrong was that the rule for subshells actually is "removing sequences of one or more <newline>s at the end of the substitution."

And that is in effect for heritage and posix. So we can't use that as feature detection :-(

Switching to bash I do not think works for AIX, which does not include bash by default.

I am still thinking that testing with uname for Solaris 10 and just not protecting trailing NLs in that case may be the best we can do.
Comment 29 jwadamson 2017-01-13 18:39:58 UTC
Here is close as I could get.

Main points
1) Had to touch the icky sed line to make it use `` instead of $(). That is the part that scares me the most, but I can convince myself the expression is correct and my limited testing of special chars looked ok.
2) solaris 10 will drop trailing newlines on arguments. They can avoid that by using a POSIX shell and setting env PROTECT_NL=true.

https://github.com/apache/ant/pull/29

This is just an idea/example, I would love it if someone could come up with anything that worked 100% consistently across platforms or could feature-detect  heirloom shell.
Comment 30 Stefan Bodewig 2017-01-14 18:02:59 UTC
switching to bash is no real option.

If the latest incarnation - I've merged Jeff's changes into the master branch - work for Thomas this probably is what we'll be going on with.
Comment 31 Thomas Stecher 2017-01-14 22:00:33 UTC
I hate the feeling to loose a challenge like this. Yesterday I thought to loose but now I have a cool idea to solve this problem and it seems to work fine.

So if we can't pass all the args directly clear to ant on solaris 10 when using the original /bin/sh... why not reinvoke everything with a compatible shell if available.

I added this code at line 18 in the original released script of ant 1.10.0

#-------------------------------------

if [ `uname` = "SunOS" -a -x /usr/xpg4/bin/sh -a -z "${SOLARIS_SPECIAL}" ] ; then
   SOLARIS_SPECIAL=true
   export SOLARIS_SPECIAL
   exec /usr/xpg4/bin/sh $0 "$@"
fi

#-------------------------------------

On solaris 10 there is on all my very naked installed Hosts a folder /usr/xpg4/bin which supplies executables compatible to POSIX !

What do you think about that hack?
Comment 32 jwadamson 2017-01-17 21:32:42 UTC
That's a very interesting approach. I'm not really sure what I think of it. 
Nice in that it would allow ant to use more POSIX in the rest of the file going forward. I find the semi-recursive aspect both concerning and thrilling.

Comments from the ant team or anyone else familiar with this issue?
Comment 33 Stefan Bodewig 2017-01-18 16:04:55 UTC
I'm afraid we still couldn't rely on POSIXy behaviour as /usr/xpg4/bin/sh may simply not be there. I'm not scared by the sem-recursiveness and could live with it.
Comment 34 Stefan Bodewig 2017-02-01 08:23:15 UTC
*** Bug 60673 has been marked as a duplicate of this bug. ***
Comment 35 Stefan Bodewig 2017-02-01 14:42:32 UTC
if possible I'd like to stick with the script currently at the tip of the 1.9.x and master branches and cut new Ant releases.

Thomas, have you given that script a try?
Comment 36 Thomas Stecher 2017-02-01 14:49:26 UTC
Hi Stefan,

ahm, which version exactly should I try? I'm a little confised :-)
Comment 37 Stefan Bodewig 2017-02-01 14:52:19 UTC
https://git-wip-us.apache.org/repos/asf?p=ant.git;a=blob;f=src/script/ant;h=8252d779c3aba453c0e1729f81763063fe12d4a7;hb=a92845215f55eb63f0ad911c1610eac4dfaf82b4 - which is the master branch. 1.9.x is the branch that will lead to 1.9.9. The scripts are identical, though.

Or https://github.com/apache/ant/blob/master/src/script/ant if you prefer the github UI :-)
Comment 38 Thomas Stecher 2017-02-01 15:12:43 UTC
It seems to work except preserving the newlines (like my suggestion did :-))

stecher@b2sn13:~/java > apache-ant-1.10.0/bin/ant -f test.xml
Buildfile: /home/stecher/java/test.xml

run:
     [echo] hello world
     [echo] foo=${foo}_
     [echo] bar=${bar}_
     [echo] newline=${newline}_
     [echo] doublespace=${doublespace}_
     [echo] end

BUILD SUCCESSFUL
Total time: 0 seconds
Comment 39 Stefan Bodewig 2017-02-01 15:51:41 UTC
Can you live with that or do we really want to shell out to bash when available?
Comment 40 Armand Welsh 2017-02-01 17:57:29 UTC
I have confirmed the current 1.9.x tip for ant wrapper script does work on Solaris 10.

As a side note, the current 1.9.8 version would work on Solaris 11, as Oracle changed /bin/sh from the native Bourne shell, to a symbolic link to ksh93.  But All prior versions of Solaris (Oracle and Sun) use the legacy Bourne shell at /bin/sh.
Comment 41 Stefan Bodewig 2017-02-01 20:40:28 UTC
Many thanks, Armand.

Please forgive my ignorance, are any Solaris versions older than 10 still in widespread use? We currently special case Solaris 10 only and would likely still fail on Solaris 9.
Comment 42 Thomas Stecher 2017-02-02 08:49:03 UTC
It's quiet interesting that there are some poeple who need newlines in commandline arguments. It was a cool challenge to find a way to keep them also on Solaris 10. But I really don't know in which usecase that makes sense. But on the other Hand from a technical view it's not okay to modify any argument that comes in.

But: I can live without that feature on Solaris 10. Even on other platforms :-)

I'm looking forward  for the next release of ant fixing my real problem. Thank you very much !
Comment 43 Stefan Bodewig 2017-02-02 10:02:14 UTC
Not preserving newlines is the behaviour of 1.9.6 and earlier, I think. So there are not too many people who need it :-)

I'll create an enhancement request for preserving newlines on Solaris 10 (and maybe even before that) and will start the release process in the coming days.

Many thanks for testing and many thanks again to Jeff for providing the fix.

Will be fixed in 1.10.1 and 1.9.9.