public class StandardRobotsTxtProvider extends Object implements IRobotsTxtProvider
Implementation of IRobotsTxtProvider
as per the robots.txt standard
described at
http://www.robotstxt.org/robotstxt.html.
<robotsTxt ignore="false" class="com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider"/>
The following ignores "robots.txt" files present on web sites.
<robotsTxt ignore="true" />
Constructor and Description |
---|
StandardRobotsTxtProvider() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
RobotsTxt |
getRobotsTxt(org.apache.http.client.HttpClient httpClient,
String url,
String userAgent)
Gets robots.txt rules.
|
int |
hashCode() |
protected RobotsTxt |
parseRobotsTxt(InputStream is,
String url,
String userAgent) |
String |
toString() |
public RobotsTxt getRobotsTxt(org.apache.http.client.HttpClient httpClient, String url, String userAgent)
IRobotsTxtProvider
getRobotsTxt
in interface IRobotsTxtProvider
httpClient
- the http client to grab robots.txturl
- the URL to derive the robots.txt fromuserAgent
- the User-Agent to match ourselves with the robot rulesprotected RobotsTxt parseRobotsTxt(InputStream is, String url, String userAgent) throws IOException
IOException
Copyright © 2009–2021 Norconex Inc.. All rights reserved.