Class StandardRobotsMetaProvider
- java.lang.Object
-
- com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- All Implemented Interfaces:
IRobotsMetaProvider,IXMLConfigurable
public class StandardRobotsMetaProvider extends Object implements IRobotsMetaProvider, IXMLConfigurable
Implementation of
IRobotsMetaProvideras per X-Robots-Tag and ROBOTS standards. Extracts robots information from "ROBOTS" meta tag in an HTML page or "X-Robots-Tag" tag in the HTTP header (see https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag and http://www.robotstxt.org/meta.html).If you specified a prefix for the HTTP headers, make sure to specify it again here or the robots meta tags will not be found.
If robots instructions are provided in both the HTML page and HTTP header, the ones in HTML page will take precedence, and the ones in HTTP header will be ignored.
XML configuration usage:
<robotsMeta ignore="false" class="com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider"> <headersPrefix>(string prefixing headers)</headersPrefix> </robotsMeta>XML usage example:
<robotsMeta ignore="true"/>The above example ignores robot meta information.
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description StandardRobotsMetaProvider()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object other)StringgetHeadersPrefix()RobotsMetagetRobotsMeta(Reader document, String documentUrl, ContentType contentType, Properties httpHeaders)Extracts Robots meta information for a page, if any.inthashCode()voidloadFromXML(XML xml)voidsaveToXML(XML xml)voidsetHeadersPrefix(String headersPrefix)StringtoString()
-
-
-
Method Detail
-
getRobotsMeta
public RobotsMeta getRobotsMeta(Reader document, String documentUrl, ContentType contentType, Properties httpHeaders) throws IOException
Description copied from interface:IRobotsMetaProviderExtracts Robots meta information for a page, if any.- Specified by:
getRobotsMetain interfaceIRobotsMetaProvider- Parameters:
document- the documentdocumentUrl- document urlcontentType- the document content typehttpHeaders- the document HTTP Headers- Returns:
- robots meta instance
- Throws:
IOException- problem reading the document
-
getHeadersPrefix
public String getHeadersPrefix()
-
setHeadersPrefix
public void setHeadersPrefix(String headersPrefix)
-
loadFromXML
public void loadFromXML(XML xml)
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
saveToXML
public void saveToXML(XML xml)
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
-