Interface IHttpFetcher

All Known Implementing Classes:
AbstractHttpFetcher, GenericHttpFetcher, PhantomJSDocumentFetcher, WebDriverHttpFetcher

public interface IHttpFetcher
Fetches HTTP resources.
Since:
3.0.0
Author:
Pascal Essiembre
  • Method Details

    • getUserAgent

      String getUserAgent()
    • accept

      boolean accept(Doc doc, HttpMethod httpMethod)
    • fetch

      IHttpFetchResponse fetch(CrawlDoc doc, HttpMethod httpMethod) throws HttpFetchException

      Performs an HTTP request for the supplied document reference and HTTP method.

      For each HTTP method supported, implementors should do their best to populate the document and its CrawlDocInfo with as much information they can.

      Unsupported HTTP methods should return an HTTP response with the CrawlState.UNSUPPORTED state. To prevent userse having to configure multiple HTTP clients, implementors should try to support both the GET and HEAD methods. POST is only used in special cases and is often not used during a crawl session.

      A null method is treated as a GET.

      Parameters:
      doc - document to fetch or to use to make the request.
      httpMethod - HTTP method
      Returns:
      an HTTP response
      Throws:
      HttpFetchException - problem when fetching the document
      See Also: