Class ApacheHttpUtil
- java.lang.Object
-
- com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
public final class ApacheHttpUtil extends Object
Utility methods for fetcher implementations using Apache HttpClient.- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidapplyContentTypeAndCharset(String value, CrawlDocInfo docInfo)Applies theContent-TypeHTTP response header on the supplied document info.static booleanapplyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc)Applies the HTTP response content to a document if such content exists.static voidapplyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)Applies the HTTP response headers to a document.static voidauthenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig)static org.apache.http.client.methods.HttpRequestBasecreateUriRequest(String url, HttpMethod method)Creates an HTTP request.static org.apache.http.client.methods.HttpRequestBasecreateUriRequest(String url, String method)Creates an HTTP request.static voidsetRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)Sets theIf-Modified-SinceHTTP request header based on document cached last crawled date (if any).static voidsetRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)Sets the ETagIf-None-MatchHTTP request header based on document cached ETag value (if any).
-
-
-
Method Detail
-
applyResponseContent
public static boolean applyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc) throws IOExceptionApplies the HTTP response content to a document if such content exists. The stream is fully downloaded and associated with a document.
- Parameters:
response- the HTTP responsedoc- document to apply headers on- Returns:
trueif there was content to apply- Throws:
IOException- could not read existing content
-
applyResponseHeaders
public static void applyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)Applies the HTTP response headers to a document. This method will do its best to derive relevant information from the HTTP headers that can be set on the document
HttpDocInfo:- Content type
- Content encoding
- ETag
In addition, all HTTP headers will be added to the document metadata, with an optional prefix.
- Parameters:
response- the HTTP responseprefix- optional metadata prefix for all HTTP response headersdoc- document to apply headers on
-
applyContentTypeAndCharset
public static void applyContentTypeAndCharset(String value, CrawlDocInfo docInfo)
Applies theContent-TypeHTTP response header on the supplied document info. It does so by extracting both the content type and charset from the value, and sets them by invokingDocInfo.setContentType(ContentType)andDocInfo.setContentEncoding(String). This method is automatically invoked byapplyResponseHeaders(HttpResponse, String, CrawlDoc)when encountering a content type header.- Parameters:
value- value to parse and set.docInfo- document info
-
setRequestIfModifiedSince
public static void setRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)Sets theIf-Modified-SinceHTTP request header based on document cached last crawled date (if any).- Parameters:
request- HTTP requestdoc- document
-
setRequestIfNoneMatch
public static void setRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)Sets the ETagIf-None-MatchHTTP request header based on document cached ETag value (if any).- Parameters:
request- HTTP requestdoc- document
-
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, String method)
Creates an HTTP request.- Parameters:
url- the request target URLmethod- HTTP method (defaults to GET ifnull)- Returns:
- Apache HTTP request
-
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, HttpMethod method)
Creates an HTTP request.- Parameters:
url- the request target URLmethod- HTTP method (defaults to GET ifnull)- Returns:
- Apache HTTP request
-
authenticateUsingForm
public static void authenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig) throws IOException, URISyntaxException- Throws:
IOExceptionURISyntaxException
-
-