Bug 59943 - Problem with Node order using HtmlParsingUtils.getDOM which can impact HTML Link Parser
Summary: Problem with Node order using HtmlParsingUtils.getDOM which can impact HTML L...
Status: REOPENED
Alias: None
Product: JMeter
Classification: Unclassified
Component: HTTP (show other bugs)
Version: 2.4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-04 12:52 UTC by David Hubbard
Modified: 2016-08-19 07:46 UTC (History)
1 user (show)



Attachments
Sample HTML response file (2.29 KB, text/html)
2016-08-04 12:52 UTC, David Hubbard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Hubbard 2016-08-04 12:52:18 UTC
Created attachment 34102 [details]
Sample HTML response file

Hi

I'm writing a PostProcessor in Java for JMeter - using HtmlParsingUtils.getDOM() to access the HTML response data.

In this I'm trying to pick out the Form fields from a response so that - when JMeter submits the form back - I can process the manage setting some of the posted param data.

My problem is the site I am trying to test returns some slightly wacky HTML, whereby several (showing one here) input field is included in a <table> element

<input name="1">
<input name="2" />
   <table><tr><td><input name="3"></td>
<input name="4"></tr>  
</table>

When fields get posted back by the browser they send them (form params) in-order (1+2+3+4) as expected.

However when using HtmlParsingUtils.getDOM() the node tree for the <table> seems to get included only when the </table> is parsed.

So the tree returned has 3 and 4 swapped
(input 1}
(input 2}
(input 4}
(table
    (tr
      (td
        (input 3}
      )
    )
)


Is this a bug? I confess I'm not sure - it depends if the start tag <table> or end tag </table> are the trigger for adding, however, from my reading "input 4 is within the table node.

In this I can only assume that "input" fields are handled specially in the parser (since in html they don't need to include a closing tag or "/>" ? 

What it means for me is that I order I post params from JMeter is affected and the app dosesn't like this.  You might think that the app should handle params based on name even in wrong order, but the reality is that the input field names are not unique on the page - the app is using a framework with we can't change, 

I have attached an example html file which show this.
Comment 1 Philippe Mouawad 2016-08-05 20:19:36 UTC
Hello,
Thanks for report.

You're using here internal APIs of JMeter. Besides, this part of the API uses an old library jtidy (which makes cleanup of html) which may explain your issue.

I suggest you rather rely on jsoup or jodd-lagarto which are embedded in JMeter , so you can use them in your custom code.


Regarding your report, it appears it could affect one Element in JMeter "HTML Link Parser" which I would advise not to use either because of limitations in terms of performance and distributed testing not working for it.


If you solve your issue by following the above comments, it would be nice to ping us for feedback.
Regards
Comment 2 David Hubbard 2016-08-18 12:57:44 UTC
(In reply to Philippe Mouawad from comment #1)
> Hello,
> Thanks for report.
> 
> You're using here internal APIs of JMeter. Besides, this part of the API
> uses an old library jtidy (which makes cleanup of html) which may explain
> your issue.
> 
> I suggest you rather rely on jsoup or jodd-lagarto which are embedded in
> JMeter , so you can use them in your custom code.
> 
> 
> Regarding your report, it appears it could affect one Element in JMeter
> "HTML Link Parser" which I would advise not to use either because of
> limitations in terms of performance and distributed testing not working for
> it.
> 
> 
> If you solve your issue by following the above comments, it would be nice to
> ping us for feedback.
> Regards

Phillepe hi 

Thanks for the feedback - I have successfully switched to jsoup, which handles my scenario.  

Regards
David
Comment 3 Philippe Mouawad 2016-08-18 19:42:29 UTC
(In reply to David Hubbard from comment #2)
> (In reply to Philippe Mouawad from comment #1)
> > Hello,
> > Thanks for report.
> > 
> > You're using here internal APIs of JMeter. Besides, this part of the API
> > uses an old library jtidy (which makes cleanup of html) which may explain
> > your issue.
> > 
> > I suggest you rather rely on jsoup or jodd-lagarto which are embedded in
> > JMeter , so you can use them in your custom code.
> > 
> > 
> > Regarding your report, it appears it could affect one Element in JMeter
> > "HTML Link Parser" which I would advise not to use either because of
> > limitations in terms of performance and distributed testing not working for
> > it.
> > 
> > 
> > If you solve your issue by following the above comments, it would be nice to
> > ping us for feedback.
> > Regards
> 
> Phillepe hi 
> 
> Thanks for the feedback - I have successfully switched to jsoup, which
> handles my scenario.  
> 
> Regards
> David

Thanks for your feedback.
I will reopen the bug, as it affects a component of JMeter
Comment 4 David Hubbard 2016-08-19 07:40:16 UTC
(In reply to Philippe Mouawad from comment #3)
> 
> Thanks for your feedback.
> I will reopen the bug, as it affects a component of JMeter

Ok

A couple of observations in passing: 
1) as you are bundling jsoup would you also consider adding xsoup (for xpath support on top of jsoup) <https://github.com/code4craft/xsoup> ?
2) should HtmlParsingUtils be marked @Deprecated in code <https://bz.apache.org/bugzilla/show_bug.cgi?id=59036>?

Thanks
David
Comment 5 Philippe Mouawad 2016-08-19 07:46:40 UTC
(In reply to David Hubbard from comment #4)
> (In reply to Philippe Mouawad from comment #3)
> > 
> > Thanks for your feedback.
> > I will reopen the bug, as it affects a component of JMeter
> 
> Ok
> 
> A couple of observations in passing: 
> 1) as you are bundling jsoup would you also consider adding xsoup (for xpath
> support on top of jsoup) <https://github.com/code4craft/xsoup> ?

Interesting, XPath performances in JMeter are not that great.
Would you mind raising the subject on dev mailing list ?

> 2) should HtmlParsingUtils be marked @Deprecated in code
> <https://bz.apache.org/bugzilla/show_bug.cgi?id=59036>?
> 
Same answer.
> Thanks
> David