Interface IRobotsTxtProvider
-
- All Known Implementing Classes:
StandardRobotsTxtProvider
public interface IRobotsTxtProvider
Given a URL, extract any "robots.txt" rules. Implementations are expected to cache existing robots.txt instances or, cache the fact none was found, for the duration of a crawl session so no attempt to re-download it is made.- Author:
- Pascal Essiembre
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description RobotsTxt
getRobotsTxt(HttpFetchClient fetchClient, String url)
Gets robots.txt rules.
-
-
-
Method Detail
-
getRobotsTxt
RobotsTxt getRobotsTxt(HttpFetchClient fetchClient, String url)
Gets robots.txt rules. This method signature changed in 1.3 to include the userAgent.- Parameters:
fetchClient
- http fetcher executor to grab robots.txturl
- the URL to derive the robots.txt from- Returns:
- robots.txt rules
-
-