Class StandardRobotsMetaProvider

  • All Implemented Interfaces:
    IRobotsMetaProvider, IXMLConfigurable

    public class StandardRobotsMetaProvider
    extends Object
    implements IRobotsMetaProvider, IXMLConfigurable

    Implementation of IRobotsMetaProvider as per X-Robots-Tag and ROBOTS standards. Extracts robots information from "ROBOTS" meta tag in an HTML page or "X-Robots-Tag" tag in the HTTP header (see https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag and http://www.robotstxt.org/meta.html).

    If you specified a prefix for the HTTP headers, make sure to specify it again here or the robots meta tags will not be found.

    If robots instructions are provided in both the HTML page and HTTP header, the ones in HTML page will take precedence, and the ones in HTTP header will be ignored.

    XML configuration usage:

    
    <robotsMeta
        ignore="false"
        class="com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider">
      <headersPrefix>(string prefixing headers)</headersPrefix>
    </robotsMeta>

    XML usage example:

    
    <robotsMeta
        ignore="true"/>

    The above example ignores robot meta information.

    Author:
    Pascal Essiembre