Class StandardRobotsTxtProvider
- java.lang.Object
-
- com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
-
- All Implemented Interfaces:
IRobotsTxtProvider
public class StandardRobotsTxtProvider extends Object implements IRobotsTxtProvider
Implementation of
IRobotsTxtProvideras per the robots.txt standard described at http://www.robotstxt.org/robotstxt.html.XML configuration usage:
<robotsTxt ignore="false" class="com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider"/>XML usage example:
<pre> <robotsTxt ignore="true"/>The above example ignores "robots.txt" files present on web sites.
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description StandardRobotsTxtProvider()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object other)RobotsTxtgetRobotsTxt(HttpFetchClient fetcher, String url)Gets robots.txt rules.inthashCode()protected RobotsTxtparseRobotsTxt(InputStream is, String url, String userAgent)StringtoString()
-
-
-
Method Detail
-
getRobotsTxt
public RobotsTxt getRobotsTxt(HttpFetchClient fetcher, String url)
Description copied from interface:IRobotsTxtProviderGets robots.txt rules. This method signature changed in 1.3 to include the userAgent.- Specified by:
getRobotsTxtin interfaceIRobotsTxtProvider- Parameters:
fetcher- http fetcher executor to grab robots.txturl- the URL to derive the robots.txt from- Returns:
- robots.txt rules
-
parseRobotsTxt
protected RobotsTxt parseRobotsTxt(InputStream is, String url, String userAgent) throws IOException
- Throws:
IOException
-
-