public class StandardRobotsMetaProvider extends Object implements IRobotsMetaProvider, IXMLConfigurable
Implementation of IRobotsMetaProvider
as per X-Robots-Tag
and ROBOTS standards.
Extracts robots information from "ROBOTS" meta tag in an HTML page
or "X-Robots-Tag" tag in the HTTP header (see
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
and
http://www.robotstxt.org/meta.html).
If you specified a prefix for the HTTP headers, make sure to specify it again here or the robots meta tags will not be found.
If robots instructions are provided in both the HTML page and HTTP header, the ones in HTML page will take precedence, and the ones in HTTP header will be ignored.
<robotsMeta
ignore="false"
class="com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider">
<headersPrefix>(string prefixing headers)</headersPrefix>
</robotsMeta>
<robotsMeta
ignore="true"/>
The above example ignores robot meta information.
Constructor and Description |
---|
StandardRobotsMetaProvider() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getHeadersPrefix() |
RobotsMeta |
getRobotsMeta(Reader document,
String documentUrl,
ContentType contentType,
Properties httpHeaders)
Extracts Robots meta information for a page, if any.
|
int |
hashCode() |
void |
loadFromXML(XML xml) |
void |
saveToXML(XML xml) |
void |
setHeadersPrefix(String headersPrefix) |
String |
toString() |
public RobotsMeta getRobotsMeta(Reader document, String documentUrl, ContentType contentType, Properties httpHeaders) throws IOException
IRobotsMetaProvider
getRobotsMeta
in interface IRobotsMetaProvider
document
- the documentdocumentUrl
- document urlcontentType
- the document content typehttpHeaders
- the document HTTP HeadersIOException
- problem reading the documentpublic String getHeadersPrefix()
public void setHeadersPrefix(String headersPrefix)
public void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
Copyright © 2009–2023 Norconex Inc.. All rights reserved.