Bug 41025 - ROBOTS META-Tag directive needed in mod_autoindex
Summary: ROBOTS META-Tag directive needed in mod_autoindex
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_autoindex (show other bugs)
Version: 2.2.4
Hardware: All All
: P3 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: MassUpdate, PatchAvailable
Depends on:
Blocks:
 
Reported: 2006-11-23 05:12 UTC by Richard Schaal
Modified: 2018-11-07 21:09 UTC (History)
0 users



Attachments
Adds robot directive to mod_autoindex.c (1.26 KB, patch)
2006-11-23 05:17 UTC, Richard Schaal
Details | Diff
Configurable HEAD contents (1.56 KB, patch)
2006-12-25 08:59 UTC, Nick Kew
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Schaal 2006-11-23 05:12:59 UTC
When Htdig is used to index a collection of documents in a directory tree, pages
that contain the directory file list seem to flood the subsequent search
results.  A simple robot directive inserted into pages that are created for
directories will cause the indexing robot to not index the directory, but read
the files instead.  This greatly improves the quality of the returning information.

Here is the directive:
     <meta name="robots" content="noindex,follow">

It should be placed as follows in the head section:

!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /~rds</title>
  <meta name="robots" content="noindex,follow">
 </head>
 <body>
<h1>Index of /~rds</h1>
<table><tr><th><img src="/icons/blank.gif" alt="[ICO]"></th><th><a
href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></th><th><a
href="?C=S;O=A">Size</a></th><th><a
href="?C=D;O=A">Description</a></th></tr><tr><th colspan="5"><hr></th></tr>

<tr><td valign="top"><img src="/icons/back.gif" alt="[DIR]"></td><td><a
href="/">Parent Directory</a></td><td>&nbsp;</td><td align="right">  - </td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a
href="Music/">Music/</a></td><td align="right">03-Sep-2006 10:52  </td><td
align="right">  - </td></tr>
....

I've used this tweak for a year or more without difficulty.  It would certainly
be nice to see this incorporated so I don't need to patch future releases of the
server.
Thanks!
- Richard
Comment 1 Richard Schaal 2006-11-23 05:17:09 UTC
Created attachment 19165 [details]
Adds robot directive to mod_autoindex.c
Comment 2 Nick Kew 2006-12-25 08:59:10 UTC
Created attachment 19304 [details]
Configurable HEAD contents

This should be configurable.  Patch attached.  Not sure about including it as
standard.
Comment 3 D. Stussy 2007-02-01 19:35:49 UTC
I concur, but I came here not because my local HTDig engine indexed it, but 
google.com's search engine indexed the directory and returns it in search 
results.  The "configurable HEAD contents" patch is the correct approach, but I 
do feel there should be an additional variable (called "IndexRobots") that 
contains the search engine directives - as this may be important enough a 
function to have its own name.  If one were to go ONLY with the HEAD patch, 
then one would have to redefine the robots control string any time other 
information were inserted into the directory's head section; an easy step to 
forget.  "IndexStyleSheet" is already mainstream....
Comment 4 D. Stussy 2007-05-29 19:53:06 UTC
More - I found this trick, but it's not a complete solution.
URL:  http://www.htdig.org/FAQ.html#q4.23

"The other technique you can use, if you want the directory index to be made by 
the web server, is to get the server to insert the robots meta tag into the 
index page it generates. In Apache, this is done using the HeaderName and 
IndexOptions directives in the directory's .htaccess file. For example:

   HeaderName .htrobots 
   IndexOptions FancyIndexing SuppressHTMLPreamble

and in the .htrobots file:

<HTML><head>
<META NAME="robots" CONTENT="noindex, follow">
<title>Index of /this/dir</title>
</head>

-- 
With this method, the title is NOT dynamic but fixed.  If a fixed file with 
some sort of server-side processing that need not reside in the directory being 
displayed can be used, then there might be a valid and complete fix, but such 
seems like a nasty hack to me.  The patch seems to be a better solution.
Comment 5 William A. Rowe Jr. 2018-11-07 21:09:39 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.