Class ApacheHttpUtil
- java.lang.Object
 - 
- com.norconex.collector.http.fetch.util.ApacheHttpUtil
 
 
- 
public final class ApacheHttpUtil extends Object
Utility methods for fetcher implementations using Apache HttpClient.- Since:
 - 3.0.0
 - Author:
 - Pascal Essiembre
 
 
- 
- 
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidapplyContentTypeAndCharset(String value, CrawlDocInfo docInfo)Applies theContent-TypeHTTP response header on the supplied document info.static booleanapplyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc)Applies the HTTP response content to a document if such content exists.static voidapplyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)Applies the HTTP response headers to a document.static voidauthenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig)static org.apache.http.client.methods.HttpRequestBasecreateUriRequest(String url, HttpMethod method)Creates an HTTP request.static org.apache.http.client.methods.HttpRequestBasecreateUriRequest(String url, String method)Creates an HTTP request.static voidsetRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)Sets theIf-Modified-SinceHTTP request header based on document cached last crawled date (if any).static voidsetRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)Sets the ETagIf-None-MatchHTTP request header based on document cached ETag value (if any). 
 - 
 
- 
- 
Method Detail
- 
applyResponseContent
public static boolean applyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc) throws IOExceptionApplies the HTTP response content to a document if such content exists. The stream is fully downloaded and associated with a document.
- Parameters:
 response- the HTTP responsedoc- document to apply headers on- Returns:
 trueif there was content to apply- Throws:
 IOException- could not read existing content
 
- 
applyResponseHeaders
public static void applyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)Applies the HTTP response headers to a document. This method will do its best to derive relevant information from the HTTP headers that can be set on the document
HttpDocInfo:- Content type
 - Content encoding
 - ETag
 
In addition, all HTTP headers will be added to the document metadata, with an optional prefix.
- Parameters:
 response- the HTTP responseprefix- optional metadata prefix for all HTTP response headersdoc- document to apply headers on
 
- 
applyContentTypeAndCharset
public static void applyContentTypeAndCharset(String value, CrawlDocInfo docInfo)
Applies theContent-TypeHTTP response header on the supplied document info. It does so by extracting both the content type and charset from the value, and sets them by invokingDocInfo.setContentType(ContentType)andDocInfo.setContentEncoding(String). This method is automatically invoked byapplyResponseHeaders(HttpResponse, String, CrawlDoc)when encountering a content type header.- Parameters:
 value- value to parse and set.docInfo- document info
 
- 
setRequestIfModifiedSince
public static void setRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)Sets theIf-Modified-SinceHTTP request header based on document cached last crawled date (if any).- Parameters:
 request- HTTP requestdoc- document
 
- 
setRequestIfNoneMatch
public static void setRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)Sets the ETagIf-None-MatchHTTP request header based on document cached ETag value (if any).- Parameters:
 request- HTTP requestdoc- document
 
- 
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, String method)
Creates an HTTP request.- Parameters:
 url- the request target URLmethod- HTTP method (defaults to GET ifnull)- Returns:
 - Apache HTTP request
 
 
- 
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, HttpMethod method)
Creates an HTTP request.- Parameters:
 url- the request target URLmethod- HTTP method (defaults to GET ifnull)- Returns:
 - Apache HTTP request
 
 
- 
authenticateUsingForm
public static void authenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig) throws IOException, URISyntaxException- Throws:
 IOExceptionURISyntaxException
 
 - 
 
 -