public class StandardRobotsTxtProvider extends Object implements IRobotsTxtProvider
Implementation of IRobotsTxtProvider
as per the robots.txt standard
described at
http://www.robotstxt.org/robotstxt.html.
<robotsTxt
ignore="false"
class="com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider"/>
<pre>
<robotsTxt
ignore="true"/>
The above example ignores "robots.txt" files present on web sites.
Constructor and Description |
---|
StandardRobotsTxtProvider() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
RobotsTxt |
getRobotsTxt(HttpFetchClient fetcher,
String url)
Gets robots.txt rules.
|
int |
hashCode() |
protected RobotsTxt |
parseRobotsTxt(InputStream is,
String url,
String userAgent) |
String |
toString() |
public RobotsTxt getRobotsTxt(HttpFetchClient fetcher, String url)
IRobotsTxtProvider
getRobotsTxt
in interface IRobotsTxtProvider
fetcher
- http fetcher executor to grab robots.txturl
- the URL to derive the robots.txt fromprotected RobotsTxt parseRobotsTxt(InputStream is, String url, String userAgent) throws IOException
IOException
Copyright © 2009–2023 Norconex Inc.. All rights reserved.