public class StandardRobotsMetaProvider extends Object implements IRobotsMetaProvider, IXMLConfigurable
Implementation of IRobotsMetaProvider
as per X-Robots-Tag
and ROBOTS standards.
Extracts robots information from "ROBOTS" meta tag in an HTML page
or "X-Robots-Tag" tag in the HTTP header (see
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
and
http://www.robotstxt.org/meta.html).
If you specified a prefix for the HTTP headers, make sure to specify it again here or the robots meta tags will not be found.
If robots instructions are provided in both the HTML page and HTTP header, the ones in HTML page will take precedence, and the ones in HTTP header will be ignored.
<robotsMeta ignore="false" class="com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider"> <headersPrefix>(string prefixing headers)</headersPrefix> </robotsMeta>
The following ignores robot meta information.
<robotsMeta ignore="true" />
Constructor and Description |
---|
StandardRobotsMetaProvider() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getHeadersPrefix() |
RobotsMeta |
getRobotsMeta(Reader document,
String documentUrl,
ContentType contentType,
Properties httpHeaders)
Extracts Robots meta information for a page, if any.
|
int |
hashCode() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setHeadersPrefix(String headersPrefix) |
String |
toString() |
public RobotsMeta getRobotsMeta(Reader document, String documentUrl, ContentType contentType, Properties httpHeaders) throws IOException
IRobotsMetaProvider
getRobotsMeta
in interface IRobotsMetaProvider
document
- the documentdocumentUrl
- document urlcontentType
- the document content typehttpHeaders
- the document HTTP HeadersIOException
- problem reading the documentpublic String getHeadersPrefix()
public void setHeadersPrefix(String headersPrefix)
public void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2009–2021 Norconex Inc.. All rights reserved.