Class ApacheHttpUtil
- java.lang.Object
-
- com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
public final class ApacheHttpUtil extends Object
Utility methods for fetcher implementations using Apache HttpClient.- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
applyContentTypeAndCharset(String value, CrawlDocInfo docInfo)
Applies theContent-Type
HTTP response header on the supplied document info.static boolean
applyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc)
Applies the HTTP response content to a document if such content exists.static void
applyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)
Applies the HTTP response headers to a document.static void
authenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig)
static org.apache.http.client.methods.HttpRequestBase
createUriRequest(String url, HttpMethod method)
Creates an HTTP request.static org.apache.http.client.methods.HttpRequestBase
createUriRequest(String url, String method)
Creates an HTTP request.static void
setRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)
Sets theIf-Modified-Since
HTTP request header based on document cached last crawled date (if any).static void
setRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)
Sets the ETagIf-None-Match
HTTP request header based on document cached ETag value (if any).
-
-
-
Method Detail
-
applyResponseContent
public static boolean applyResponseContent(org.apache.http.HttpResponse response, CrawlDoc doc) throws IOException
Applies the HTTP response content to a document if such content exists. The stream is fully downloaded and associated with a document.
- Parameters:
response
- the HTTP responsedoc
- document to apply headers on- Returns:
true
if there was content to apply- Throws:
IOException
- could not read existing content
-
applyResponseHeaders
public static void applyResponseHeaders(org.apache.http.HttpResponse response, String prefix, CrawlDoc doc)
Applies the HTTP response headers to a document. This method will do its best to derive relevant information from the HTTP headers that can be set on the document
HttpDocInfo
:- Content type
- Content encoding
- ETag
In addition, all HTTP headers will be added to the document metadata, with an optional prefix.
- Parameters:
response
- the HTTP responseprefix
- optional metadata prefix for all HTTP response headersdoc
- document to apply headers on
-
applyContentTypeAndCharset
public static void applyContentTypeAndCharset(String value, CrawlDocInfo docInfo)
Applies theContent-Type
HTTP response header on the supplied document info. It does so by extracting both the content type and charset from the value, and sets them by invokingDocInfo.setContentType(ContentType)
andDocInfo.setContentEncoding(String)
. This method is automatically invoked byapplyResponseHeaders(HttpResponse, String, CrawlDoc)
when encountering a content type header.- Parameters:
value
- value to parse and set.docInfo
- document info
-
setRequestIfModifiedSince
public static void setRequestIfModifiedSince(org.apache.http.HttpRequest request, CrawlDoc doc)
Sets theIf-Modified-Since
HTTP request header based on document cached last crawled date (if any).- Parameters:
request
- HTTP requestdoc
- document
-
setRequestIfNoneMatch
public static void setRequestIfNoneMatch(org.apache.http.HttpRequest request, CrawlDoc doc)
Sets the ETagIf-None-Match
HTTP request header based on document cached ETag value (if any).- Parameters:
request
- HTTP requestdoc
- document
-
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, String method)
Creates an HTTP request.- Parameters:
url
- the request target URLmethod
- HTTP method (defaults to GET ifnull
)- Returns:
- Apache HTTP request
-
createUriRequest
public static org.apache.http.client.methods.HttpRequestBase createUriRequest(String url, HttpMethod method)
Creates an HTTP request.- Parameters:
url
- the request target URLmethod
- HTTP method (defaults to GET ifnull
)- Returns:
- Apache HTTP request
-
authenticateUsingForm
public static void authenticateUsingForm(org.apache.http.client.HttpClient httpClient, HttpAuthConfig authConfig) throws IOException, URISyntaxException
- Throws:
IOException
URISyntaxException
-
-