Class StandardRobotsTxtProvider
java.lang.Object
com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- All Implemented Interfaces:
IRobotsTxtProvider
Implementation of IRobotsTxtProvider as per the robots.txt standard
described at
http://www.robotstxt.org/robotstxt.html.
XML configuration usage:
<robotsTxt
ignore="false"
class="com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider"/>
XML usage example:
<pre>
<robotsTxt
ignore="true"/>
The above example ignores "robots.txt" files present on web sites.
- Author:
- Pascal Essiembre
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleangetRobotsTxt(HttpFetchClient fetcher, String url) Gets robots.txt rules.inthashCode()protected RobotsTxtparseRobotsTxt(InputStream is, String url, String userAgent) toString()
-
Constructor Details
-
StandardRobotsTxtProvider
public StandardRobotsTxtProvider()
-
-
Method Details
-
getRobotsTxt
Description copied from interface:IRobotsTxtProviderGets robots.txt rules. This method signature changed in 1.3 to include the userAgent.- Specified by:
getRobotsTxtin interfaceIRobotsTxtProvider- Parameters:
fetcher- http fetcher executor to grab robots.txturl- the URL to derive the robots.txt from- Returns:
- robots.txt rules
-
parseRobotsTxt
- Throws:
IOException
-
equals
-
hashCode
public int hashCode() -
toString
-