Interface IRobotsTxtProvider
- All Known Implementing Classes:
StandardRobotsTxtProvider
public interface IRobotsTxtProvider
Given a URL, extract any "robots.txt" rules. Implementations are expected
to cache existing robots.txt instances or, cache the fact none was found,
for the duration of a crawl session so no attempt to re-download it is made.
- Author:
- Pascal Essiembre
-
Method Summary
Modifier and TypeMethodDescriptiongetRobotsTxt(HttpFetchClient fetchClient, String url) Gets robots.txt rules.
-
Method Details
-
getRobotsTxt
Gets robots.txt rules. This method signature changed in 1.3 to include the userAgent.- Parameters:
fetchClient- http fetcher executor to grab robots.txturl- the URL to derive the robots.txt from- Returns:
- robots.txt rules
-