Class ApacheHttpUtil


  • public final class ApacheHttpUtil
    extends Object
    Utility methods for fetcher implementations using Apache HttpClient.
    Since:
    3.0.0
    Author:
    Pascal Essiembre
    • Method Detail

      • applyResponseContent

        public static boolean applyResponseContent​(org.apache.http.HttpResponse response,
                                                   CrawlDoc doc)
                                            throws IOException

        Applies the HTTP response content to a document if such content exists. The stream is fully downloaded and associated with a document.

        Parameters:
        response - the HTTP response
        doc - document to apply headers on
        Returns:
        true if there was content to apply
        Throws:
        IOException - could not read existing content
      • applyResponseHeaders

        public static void applyResponseHeaders​(org.apache.http.HttpResponse response,
                                                String prefix,
                                                CrawlDoc doc)

        Applies the HTTP response headers to a document. This method will do its best to derive relevant information from the HTTP headers that can be set on the document HttpDocInfo:

        • Content type
        • Content encoding
        • ETag

        In addition, all HTTP headers will be added to the document metadata, with an optional prefix.

        Parameters:
        response - the HTTP response
        prefix - optional metadata prefix for all HTTP response headers
        doc - document to apply headers on
      • setRequestIfModifiedSince

        public static void setRequestIfModifiedSince​(org.apache.http.HttpRequest request,
                                                     CrawlDoc doc)
        Sets the If-Modified-Since HTTP request header based on document cached last crawled date (if any).
        Parameters:
        request - HTTP request
        doc - document
      • setRequestIfNoneMatch

        public static void setRequestIfNoneMatch​(org.apache.http.HttpRequest request,
                                                 CrawlDoc doc)
        Sets the ETag If-None-Match HTTP request header based on document cached ETag value (if any).
        Parameters:
        request - HTTP request
        doc - document
      • createUriRequest

        public static org.apache.http.client.methods.HttpRequestBase createUriRequest​(String url,
                                                                                      String method)
        Creates an HTTP request.
        Parameters:
        url - the request target URL
        method - HTTP method (defaults to GET if null)
        Returns:
        Apache HTTP request
      • createUriRequest

        public static org.apache.http.client.methods.HttpRequestBase createUriRequest​(String url,
                                                                                      HttpMethod method)
        Creates an HTTP request.
        Parameters:
        url - the request target URL
        method - HTTP method (defaults to GET if null)
        Returns:
        Apache HTTP request