This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
When I search for "genfile.properties" from the "Entire netbeans.org domain" with everything selected, all I get are the result of my recent posting to [50cat] asking about it. Jirka came up with a couple of nbusers messages and http://projects.netbeans.org/buildsys/design.html#vcs-deps which had exactly the information I wanted. Why weren't these listed in the search results? API Javadocs, other project documents, FAQs -- all of these seem to be frequently omitted from the results. With such poor luck from the NetBeans search engine that, you're not encouraged to use it unless all other options have failed, whereas it should be the first place to look.
My guess is this is a dup of issue 13172 - meta and non-ascii chars cause searches to fail. Note the age of the issue :-( We've long had problems with the built in search, and for a long time simply did not use it, and used Google instead. We recently (~6mhts) started using it again as it was supposedly much enhanced, and does now search multiple data sets (lists, IZ, html content ... etc). That is good, but clearly there are still problems. Can you try your search again, without the ".", and check if you turn up the extra results you missed first time around ?
Searched for "genfile" and got "No matching search results found."
I can reproduce. Collab, the word "genfile" appears at this URL http://projects.netbeans.org/buildsys/design.html#vcs-deps, and yet going to http://www.netbeans.org/servlets/Search?mode=advanced, selecting "entire nb domain", and selecting all "artifact" options (or leaving them all unselected) does not return that match. Why ? rrochat, you mention there were other msgs where this string appears - can you include URLs here?
These were the ones Jirka cited that did not show up for me: http://www.netbeans.org/servlets/ReadMsg?list=nbusers&msgId=973678 or http://www.netbeans.org/servlets/ReadMsg?list=nbusers&msgId=833959 The first one is in the list I pulled up just now. When I tried to check the 2nd one, I got two "www.netbeans.org could not be found". I know it's just temporary, but oh well. I'll cross my fingers that I can submit this. Roxie
The issue seems to be getting old: today, March 22nd 2006 I've searched the "nbuser" mailing list archive and no matter what I've searched, there were no results from the current year. This is in great contrast with SUN's touted attitude of "promoting communities": a significant part of the mailing list traffic is effectively kept closed, against of what one expects from an open mailing list. I understand that the infrastructure is managed by Collab, but as their customer SUN heave the means of fixing this problem. I hope it will be fixed soon...
One way of addressing the search problem is to use a Google search appliance.
Ping ... ? This issue is over 5mths old and still no response from support. This particular case is a pretty minor issue but the bigger picture (of search being quite badly broken) is not. BTW lpintilie I could not reproduce what you saw - I was able to get plenty of results back, eg even turning up ~30 or more msgs matching "lpintilie". There are many problems with search (note particularly the comments in this issue about meta-chars), but it is not entirely broken. What kind of terms did you search for ?
Hi Jack Taking up the issue right away,I shall research a bit more on the issue and provide you a detailed update as soon as possible. Thanks, Karthik Support Operations
Updating status whiteboard. Regards, Karthik Support Operations
As much as we would like to see this search feature improved, it is still under consideration for future releases. All mailing-list search features (and other related issues) are going to be completley re-architected in the planned Discussion Services. The (.) in the search keyword is obviously an issue (as was noted in other issues as well). I will check internally if some "short-term" workarounds are possible to make the search feature work a little better, till such time the new Discussion Services are introduced.
"Short term" !? The meta-char issue (issue 13172) was reported almost *5 years* ago! A working search tool is surely one of the most important things a web app should have.
Hi rrochat Searching for keyword "genfiles.properties" returns the expected HTML document ie. http://projects.netbeans.org/buildsys/design.html now. Please note that Lucene treats "genfiles.properties" as one word and so searching for "genfile" or "genfiles" won't return expected results. Again, note its http://www.netbeans.org/servlets/Search?mode=advanced&resultsPerPage=40&query=%22genfiles.properties%22&scope=domain&artifact=apache+content&Button=Search Regards, Karthik Support Operations
1. Thanks, yes I see the results now; 2. "Please note that Lucene treats "genfiles.properties" as one word and so searching for "genfile" or "genfiles" won't return expected results.". I guess Lucene is the search utility ? Anyway, that's a bug (treating genfiles.properties as 1 word). I verified that searching for "genfiles" does not return the expected results. Is this at least fixed in SC 3.5 ?
I still have some issues with this: 1) The default search doesn't generate the same results that your query does when I go to the Community page and type in "genfiles.properties". Your link has different parameters. Perhaps "mode" is the key (if so, why does it have to be "advanced"?) or are you dealing with a version that's not live on the website for me? 2) Even with your query, I can't get it to find the nbusers messages that Jirka found. If I select mailing list archives, it shows 5 (why not more if it's the only thing selected?). If I click on "Browse all results," I see 0 results -- the query is apparently blank. Roxie
Roxie, I'm guessing your comments are addressed to Karthik ? Some answers : 1. Going to community and doing the search will search all "artefacts" on nb.org - that includes eg html, issues, mailing list msgs, etc. The URL Karthik includes here is the URL to show all (instead of just the first 5) html results. You will get the same results if you click "Browse all results from HTML content" from the normall community search. 2. I can't verify this - I get 5 msgs (as you mention, I'd rather see all of them since I selected only mailing list msgs), but if I click show all I do get a paginated list of 62 msgs. Strangely I did first get an entirely blank page, but I hit reload and it worked. Can you try the same again ?
Yes, thanks, Jack, my message was in response to Karthik's. I had indeed selected just "HTML content" for my query and had tried hitting reload. I'd also tried it with and without the double quotes. When I try it now, it does work. I have no explanation. I just tried "genfile.properties" (without the "s") and get different results than "genfiles.properties" but they're not the same as I was seeing before. Oh well, as long as it's working now, I guess it doesn't matter. Are we indeed searching javadocs and API documents (category "HTML Content"?)? I can't get that to work. These are critical. i.e. Module API http://www.netbeans.org/download/dev/javadoc/org-openide-modules/org/openide/modules/doc-files/api.html has things like "OpenIDE-Module-Specification-Version" and "OpenIDE-Module-Requires-Message" that never come up from a search (even when I select all categories). A simpler test case: "myWidgetsMode" is in the Windows API document: http://www.netbeans.org/download/dev/javadoc/org-openide-windows/org/openide/windows/doc-files/api.html
Sorry I should clarify my comment : > I guess Lucene is the search utility ? Anyway, that's a bug (treating > genfiles.properties as 1 word). I verified that searching for "genfiles" does > not return the expected results. Maybe its a Lucene internal thing, but whether "genfiles.properties" is one word or not doesn't matter; what matters is that searching for partial words should match, eg searching for "genfile" (or "gen" or "file" etc) should match.
Hi Jack I will convey your update to my engineers and will work on it.Will get back to you asap on this. Regards, Karthik Support Operations
Why don't use google?
Hi Jack If users want to search using partial words then he/she has to do wildcard search. To know more about lucene wild card search, please refer: http://lucene.apache.org/java/docs/queryparsersyntax.html http://www.netbeans.org/scdocs/Search In this case, if user wants to search for documents containing "genfiles.properties" but using a partial word say "genfile" or "gen" then he has to use query string with "*" wild card char. ie should use search string as genfile* or gen* http://www.netbeans.org/servlets/Search?mode=advanced&resultsPerPage=40&query=genfile*&scope=domain&artifact=apache+content&Button=Search http://www.netbeans.org/servlets/Search?mode=advanced&resultsPerPage=40&query=gen*&scope=domain&artifact=apache+content&Button=Search Hope this helps. let me know if you need more info. FYI pasting here snip from doc: http://www.netbeans.org/scdocs/Search Search does "stemming". If you enter the search string 'dip', you will be returned pages that contain the word 'dip' but also pages that contain 'dipping' and 'dips' since 'dip' is the word-stem of 'dipping' and 'dips'. Search will not return pages that contain 'diphthong'. Regards, Karthik Support Operations
Hi Jack Please let me know if you need any more help with this. Thanks, Karthik Support Operations
1. If you're going to use special wild search rules, they need to be linked to from the NetBeans "Search" page, i.e. http://platform.netbeans.org/servlets/Search?scope=domain&resultsPerPage=40&query=&Button.x=27&Button.y=7 2. I'm more concerned with what's being searched, or not, as the case might be. Why do javadocs and API documents not show up in search results? I think these are critical and somehow need to be weighted (if possible) so they show up first. See my March 29 entry for examples. Maybe google is the answer, but I'd hope you could do better with an intelligent search engine that understands what's important to NetBeans users.
Hi Thanks for the update given,I shall work with this and give a suitable update in a couple of days. Thanks, Karthik Support Operations
An update to this issue. CEE has robots inclusion starting Danube-S. Looking at: http://www.netbeans.org/robots.txt you can see the reason why files under /download folder has not been indexed. Please note that even CEE's local indexers respect robots.txt file. Referring the files below notice that these are under location "upload/download" and no indexer plugin would look in to it. http://www.netbeans.org/download/dev/javadoc/org-openide- modules/org/openide/modules/doc-files/api.html http://www.netbeans.org/download/dev/javadoc/org-openide- windows/org/openide/windows/doc-files/api.html Is this "download/upload" area intended to be indexed?
> Looking at: http://www.netbeans.org/robots.txt you can see the reason why > files under /download folder has not been indexed. Why ? You just mentioned that SC only supports robots.txt as of Monday June 5th for us (the date we upgraded to Danube-S). So going forward sounds like they wont be indexed (I'll update robots.txt to address that). Why weren't they before ? > Please note that even CEE's local indexers respect robots.txt file. As of Danube-S, right ? Also see issue 22183. > Referring the files below notice that these are under > location "upload/download" and no indexer plugin would look in to it. Why not ? Sorry Ani, I don't understand your response here.
> Please note that even CEE's local indexers respect robots.txt file. Actually that appears to not be true at all. Try searching for "JAM", 2 of the first 5 results displayed for HTML content are on testwww.
There were some limitations around the robots.txt file, prior to the current release. That being said we will try to verify this issue again.
Jack, > Referring the files below notice that these are under > location "upload/download" and no indexer plugin would look in to it. As "$data_dir/home/upload" which seems to be a data location specific to NB and thats the reason why Ani said that indexer would not look into it. >Actually that appears to not be true at all. Try searching for "JAM", 2 of >the first 5 results displayed for HTML content are on testwww. I just tried now searching "JAM" and i got the results but nothing is from the "testwww" project as you mentioned. Please let me know if i miss something here. -Priya
Jack?
Let me summarise the issue again and the status. JAM keyword search issue: ------------------------ As we have changed the robots settings now, nothing from 'testwww' will be indexed so it will not be searchable also. Javadoc search not working: --------------------------- It will not work as those docs are in "$data_dir/home/upload" which seems to be a data location specific to NB and thats the reason why indexer would not look into it. Was there any instructions given to Collab on indexing Javadocs in upload folder also since it is in the location specific to NB? Jack, i think we need to work with Shilpa on this if this is the priority to you. "genfiles.properties" not searchable as "genfile": -------------------------------------------------- As explained above Lucene is not taking this as a seperate string. This can be searchable using the wildcard as Karthik mentioned above. The following explanation from the engineer should help. <Snip>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Using the specific text: BUGGEN.***MDT.20030327.01.A as an example I'll give a brief breakdown of the tokenizing process. 1. "*" are not recognized as part of a token and are therefore treated as separators so the text is broken into: a. BUGGEN b. MDT.20030327.01.A 2. The second token is recognized as a potential serial number and so it is not broken down further. Punctuation including {"_"|"-"|"/"|"."|","} sandwiched between alphanumeric characters are flagged as serial numbers. It is possible, if the separator is a ".", that it could be treated as a hostname. But regardless of which it is, both are treated as a single token. Now as to whether this behavior is a bug. It is very difficult to determine the correct behavior for a generalized text search when the contents are not strictly defined as words separated by spaces. Google does very little guessing or trying to determine meaning. If I enter "1912.09.15" into google I get about 25 answers, all of which have that exact string in the text. If I enter the same into Yahoo, I get results which seem to indicate, Yahoo treated it as a date as it returns results which have 09-15-2006 and other variations. It also returns results where it looks like "2006" might be taken in isolation from the "09" and "15" parts. Yahoo returns many more results, about 6 million. I would say lucene's behavior is closer to Google. But it does try to apply some standard grammar/punctuation rules to determine tokens, possibly more than Google does. But it would appear both do far less than what Yahoo does. Now we can alter Lucene's behavior to treat "." as a token delimiter. But it will result in the variation of behavior as shown by the Google and Yahoo results above. A user entering the exact text "BUGGEN.***MDT.20030327.01.A" is going to get all artifacts which contain "BUGGEN", but the "MDT.20030327.01.A" will give a precise match. If it was broken up all artifacts which contain "MDT" or "01" as part of this field will also be returned. "A" will be ignored as a common word. If we change the behavior I could see some users as being disappointed that they now cannot get the correct artifact by entering the serial number contained within it, whereas before they could. I don't think one behavior or the other can be classified as a defect. They are simply alternatives. </snip>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Jack? Could you please review the above update.
Jack, Any updates on this or can we close this issue out as the recent changes have fixed these search issues?
Closing this as fixed. Please reopen if you have anymore questions. -Priya
closing..
We recently moved out from Collabnet's infrastructure