Class StandardRobotsTxtProvider
- java.lang.Object
-
- com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
-
- All Implemented Interfaces:
IRobotsTxtProvider
public class StandardRobotsTxtProvider extends Object implements IRobotsTxtProvider
Implementation of
IRobotsTxtProvider
as per the robots.txt standard described at http://www.robotstxt.org/robotstxt.html.XML configuration usage:
<robotsTxt ignore="false" class="com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider"/>
XML usage example:
<pre> <robotsTxt ignore="true"/>
The above example ignores "robots.txt" files present on web sites.
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description StandardRobotsTxtProvider()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
RobotsTxt
getRobotsTxt(HttpFetchClient fetcher, String url)
Gets robots.txt rules.int
hashCode()
protected RobotsTxt
parseRobotsTxt(InputStream is, String url, String userAgent)
String
toString()
-
-
-
Method Detail
-
getRobotsTxt
public RobotsTxt getRobotsTxt(HttpFetchClient fetcher, String url)
Description copied from interface:IRobotsTxtProvider
Gets robots.txt rules. This method signature changed in 1.3 to include the userAgent.- Specified by:
getRobotsTxt
in interfaceIRobotsTxtProvider
- Parameters:
fetcher
- http fetcher executor to grab robots.txturl
- the URL to derive the robots.txt from- Returns:
- robots.txt rules
-
parseRobotsTxt
protected RobotsTxt parseRobotsTxt(InputStream is, String url, String userAgent) throws IOException
- Throws:
IOException
-
-