Class GenericHttpFetcherConfig
- java.lang.Object
-
- com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
- All Implemented Interfaces:
IXMLConfigurable
public class GenericHttpFetcherConfig extends Object implements IXMLConfigurable
Generic HTTP Fetcher configuration.- Since:
- 3.0.0 (adapted from GenericHttpClientFactory and GenericDocumentFetcher from version 2.x)
- Author:
- Pascal Essiembre
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_CONNECTIONS
static int
DEFAULT_MAX_CONNECTIONS_PER_ROUTE
static int
DEFAULT_MAX_IDLE_TIME
static int
DEFAULT_MAX_REDIRECT
static List<Integer>
DEFAULT_NOT_FOUND_STATUS_CODES
static int
DEFAULT_TIMEOUT
static List<Integer>
DEFAULT_VALID_STATUS_CODES
-
Constructor Summary
Constructors Constructor Description GenericHttpFetcherConfig()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object obj)
HttpAuthConfig
getAuthConfig()
Charset
getConnectionCharset()
Gets the connection character set.int
getConnectionRequestTimeout()
Gets the timeout when requesting a connection, in millisecondsint
getConnectionTimeout()
Gets the connection timeout until a connection is established, in milliseconds.String
getCookieSpec()
String
getHeadersPrefix()
List<HttpMethod>
getHttpMethods()
Gets the list of HTTP methods to be accepted by this fetcher.String
getLocalAddress()
Gets the local address (IP or hostname).int
getMaxConnectionIdleTime()
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.int
getMaxConnectionInactiveTime()
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.int
getMaxConnections()
Gets the maximum number of connections that can be created.int
getMaxConnectionsPerRoute()
Gets the maximum number of connections that can be used per route.int
getMaxRedirects()
Gets the maximum number of redirects to be followed.List<Integer>
getNotFoundStatusCodes()
Gets HTTP status codes to be considered as "Not found" state.ProxySettings
getProxySettings()
IRedirectURLProvider
getRedirectURLProvider()
Gets the redirect URL provider.String
getRequestHeader(String name)
Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String)
.List<String>
getRequestHeaderNames()
Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String)
.int
getSocketTimeout()
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.List<String>
getSSLProtocols()
Gets the supported SSL/TLS protocols.String
getUserAgent()
List<Integer>
getValidStatusCodes()
int
hashCode()
boolean
isDisableETag()
Gets whether adding "ETag"If-None-Match
HTTP request header is disabled.boolean
isDisableHSTS()
Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Security
policy (obtained from HTTP response header).boolean
isDisableIfModifiedSince()
Gets whether adding theIf-Modified-Since
HTTP request header is disabled.boolean
isDisableSNI()
Gets whether Server Name Indication (SNI) is disabled.boolean
isExpectContinueEnabled()
Whether 'Expect: 100-continue' handshake is enabled.boolean
isForceCharsetDetection()
Gets whether character encoding is detected instead of relying on HTTP response header.boolean
isForceContentTypeDetection()
Gets whether content type is detected instead of relying on HTTP response header.boolean
isTrustAllSSLCertificates()
Whether to trust all SSL certificates (affects only "https" connections).void
loadFromXML(XML xml)
String
removeRequestHeader(String name)
Remove the request header matching the given name.void
saveToXML(XML xml)
void
setAuthConfig(HttpAuthConfig authConfig)
void
setConnectionCharset(Charset connectionCharset)
Sets the connection character set.void
setConnectionRequestTimeout(int connectionRequestTimeout)
Sets the timeout when requesting a connection, in milliseconds.void
setConnectionTimeout(int connectionTimeout)
Sets the connection timeout until a connection is established, in milliseconds.void
setCookieSpec(String cookieSpec)
void
setDisableETag(boolean disableETag)
Sets whether whether adding "ETag"If-None-Match
HTTP request header is disabled.void
setDisableHSTS(boolean disableHSTS)
Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Security
policy (obtained from HTTP response header).void
setDisableIfModifiedSince(boolean disableIfModifiedSince)
Sets whether adding theIf-Modified-Since
HTTP request header is disabled.void
setDisableSNI(boolean disableSNI)
Sets whether Server Name Indication (SNI) is disabled.void
setExpectContinueEnabled(boolean expectContinueEnabled)
Sets whether 'Expect: 100-continue' handshake is enabled.void
setForceCharsetDetection(boolean forceCharsetDetection)
Sets whether character encoding is detected instead of relying on HTTP response header.void
setForceContentTypeDetection(boolean forceContentTypeDetection)
Sets whether content type is detected instead of relying on HTTP response header.void
setHeadersPrefix(String headersPrefix)
void
setHttpMethods(List<HttpMethod> httpMethods)
Sets the list of HTTP methods to be accepted by this fetcher.void
setLocalAddress(String localAddress)
Sets the local address, which may be useful when working with multiple network interfaces.void
setMaxConnectionIdleTime(int maxConnectionIdleTime)
Sets the period of time in milliseconds after which to evict idle connections from the connection pool.void
setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.void
setMaxConnections(int maxConnections)
Sets maximum number of connections that can be created.void
setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
Sets the maximum number of connections that can be used per route.void
setMaxRedirects(int maxRedirects)
Sets the maximum number of redirects to be followed.void
setNotFoundStatusCodes(int... notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.void
setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.void
setProxySettings(ProxySettings proxy)
void
setRedirectURLProvider(IRedirectURLProvider redirectURLProvider)
Sets the redirect URL providervoid
setRequestHeader(String name, String value)
Sets a default HTTP request header every HTTP connection should have.void
setRequestHeaders(Map<String,String> headers)
Sets a default HTTP request headers every HTTP connection should have.void
setSocketTimeout(int socketTimeout)
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.void
setSSLProtocols(List<String> sslProtocols)
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.void
setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
Sets whether to trust all SSL certificate.void
setUserAgent(String userAgent)
void
setValidStatusCodes(int... validStatusCodes)
Gets valid HTTP response status codes.void
setValidStatusCodes(List<Integer> validStatusCodes)
Gets valid HTTP response status codes.String
toString()
-
-
-
Field Detail
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_REDIRECT
public static final int DEFAULT_MAX_REDIRECT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_CONNECTIONS
public static final int DEFAULT_MAX_CONNECTIONS
- See Also:
- Constant Field Values
-
DEFAULT_MAX_CONNECTIONS_PER_ROUTE
public static final int DEFAULT_MAX_CONNECTIONS_PER_ROUTE
- See Also:
- Constant Field Values
-
DEFAULT_MAX_IDLE_TIME
public static final int DEFAULT_MAX_IDLE_TIME
- See Also:
- Constant Field Values
-
-
Method Detail
-
getRedirectURLProvider
public IRedirectURLProvider getRedirectURLProvider()
Gets the redirect URL provider.- Returns:
- the redirect URL provider
-
setRedirectURLProvider
public void setRedirectURLProvider(IRedirectURLProvider redirectURLProvider)
Sets the redirect URL provider- Parameters:
redirectURLProvider
- redirect URL provider
-
setValidStatusCodes
public void setValidStatusCodes(List<Integer> validStatusCodes)
Gets valid HTTP response status codes.- Parameters:
validStatusCodes
- valid status codes
-
setValidStatusCodes
public void setValidStatusCodes(int... validStatusCodes)
Gets valid HTTP response status codes.- Parameters:
validStatusCodes
- valid status codes
-
getNotFoundStatusCodes
public List<Integer> getNotFoundStatusCodes()
Gets HTTP status codes to be considered as "Not found" state. Default is 404.- Returns:
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes
- "Not found" codes
-
getHeadersPrefix
public String getHeadersPrefix()
-
setHeadersPrefix
public void setHeadersPrefix(String headersPrefix)
-
isForceContentTypeDetection
public boolean isForceContentTypeDetection()
Gets whether content type is detected instead of relying on HTTP response header.- Returns:
true
to enable detection
-
setForceContentTypeDetection
public void setForceContentTypeDetection(boolean forceContentTypeDetection)
Sets whether content type is detected instead of relying on HTTP response header.- Parameters:
forceContentTypeDetection
-true
to enable detection
-
isForceCharsetDetection
public boolean isForceCharsetDetection()
Gets whether character encoding is detected instead of relying on HTTP response header.- Returns:
true
to enable detection
-
setForceCharsetDetection
public void setForceCharsetDetection(boolean forceCharsetDetection)
Sets whether character encoding is detected instead of relying on HTTP response header.- Parameters:
forceCharsetDetection
-true
to enable detection
-
getUserAgent
public String getUserAgent()
-
setUserAgent
public void setUserAgent(String userAgent)
-
setRequestHeader
public void setRequestHeader(String name, String value)
Sets a default HTTP request header every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
name
- HTTP request header namevalue
- HTTP request header value
-
setRequestHeaders
public void setRequestHeaders(Map<String,String> headers)
Sets a default HTTP request headers every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
headers
- map of header names and values
-
getRequestHeader
public String getRequestHeader(String name)
Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String)
.- Parameters:
name
- HTTP request header name- Returns:
- HTTP request header value or
null
if no match is found
-
getRequestHeaderNames
public List<String> getRequestHeaderNames()
Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String)
. If no request headers are set, it returns an empty array.- Returns:
- HTTP request header names
-
removeRequestHeader
public String removeRequestHeader(String name)
Remove the request header matching the given name.- Parameters:
name
- name of HTTP request header to remove- Returns:
- the previous value associated with the name, or
null
if there was no request header for the name.
-
getCookieSpec
public String getCookieSpec()
- Returns:
- the cookieSpec to use as defined in
CookieSpecs
-
setCookieSpec
public void setCookieSpec(String cookieSpec)
- Parameters:
cookieSpec
- the cookieSpec to use as defined inCookieSpecs
-
getProxySettings
public ProxySettings getProxySettings()
-
setProxySettings
public void setProxySettings(ProxySettings proxy)
-
getConnectionTimeout
public int getConnectionTimeout()
Gets the connection timeout until a connection is established, in milliseconds.- Returns:
- connection timeout
-
setConnectionTimeout
public void setConnectionTimeout(int connectionTimeout)
Sets the connection timeout until a connection is established, in milliseconds. Default isDEFAULT_TIMEOUT
.- Parameters:
connectionTimeout
- connection timeout
-
getSocketTimeout
public int getSocketTimeout()
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.- Returns:
- connection timeout
-
setSocketTimeout
public void setSocketTimeout(int socketTimeout)
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds. Default isDEFAULT_TIMEOUT
.- Parameters:
socketTimeout
- socket timeout
-
getConnectionRequestTimeout
public int getConnectionRequestTimeout()
Gets the timeout when requesting a connection, in milliseconds- Returns:
- connection timeout
-
setConnectionRequestTimeout
public void setConnectionRequestTimeout(int connectionRequestTimeout)
Sets the timeout when requesting a connection, in milliseconds. Default isDEFAULT_TIMEOUT
.- Parameters:
connectionRequestTimeout
- connection request timeout
-
getConnectionCharset
public Charset getConnectionCharset()
Gets the connection character set.- Returns:
- connection character set
-
setConnectionCharset
public void setConnectionCharset(Charset connectionCharset)
Sets the connection character set. The HTTP protocol specification mandates the use of ASCII for HTTP message headers. Sites do not always respect this and it may be necessary to force a non-standard character set.- Parameters:
connectionCharset
- connection character set
-
isExpectContinueEnabled
public boolean isExpectContinueEnabled()
Whether 'Expect: 100-continue' handshake is enabled.- Returns:
true
if enabled
-
setExpectContinueEnabled
public void setExpectContinueEnabled(boolean expectContinueEnabled)
Sets whether 'Expect: 100-continue' handshake is enabled. SeeRequestConfig.isExpectContinueEnabled()
- Parameters:
expectContinueEnabled
-true
if enabled
-
getMaxRedirects
public int getMaxRedirects()
Gets the maximum number of redirects to be followed.- Returns:
- maximum number of redirects to be followed
-
setMaxRedirects
public void setMaxRedirects(int maxRedirects)
Sets the maximum number of redirects to be followed. This can help prevent infinite loops. A value of zero effectively disables redirects. Default isDEFAULT_MAX_REDIRECT
.- Parameters:
maxRedirects
- maximum number of redirects to be followed
-
getLocalAddress
public String getLocalAddress()
Gets the local address (IP or hostname).- Returns:
- local address
-
setLocalAddress
public void setLocalAddress(String localAddress)
Sets the local address, which may be useful when working with multiple network interfaces.- Parameters:
localAddress
- locale address
-
getMaxConnections
public int getMaxConnections()
Gets the maximum number of connections that can be created.- Returns:
- number of connections
-
setMaxConnections
public void setMaxConnections(int maxConnections)
Sets maximum number of connections that can be created. Typically, you would have at least the same amount as threads. Default isDEFAULT_MAX_CONNECTIONS
.- Parameters:
maxConnections
- maximum number of connections
-
getMaxConnectionsPerRoute
public int getMaxConnectionsPerRoute()
Gets the maximum number of connections that can be used per route.- Returns:
- number of connections per route
-
setMaxConnectionsPerRoute
public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
Sets the maximum number of connections that can be used per route. Default isDEFAULT_MAX_CONNECTIONS_PER_ROUTE
.- Parameters:
maxConnectionsPerRoute
- maximum number of connections per route
-
getMaxConnectionIdleTime
public int getMaxConnectionIdleTime()
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.- Returns:
- amount of time after which to evict idle connections
-
setMaxConnectionIdleTime
public void setMaxConnectionIdleTime(int maxConnectionIdleTime)
Sets the period of time in milliseconds after which to evict idle connections from the connection pool. Default isDEFAULT_MAX_IDLE_TIME
.- Parameters:
maxConnectionIdleTime
- amount of time after which to evict idle connections
-
getMaxConnectionInactiveTime
public int getMaxConnectionInactiveTime()
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.- Returns:
- period of time in milliseconds
-
setMaxConnectionInactiveTime
public void setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled. Default is 0 (not proactively checked).- Parameters:
maxConnectionInactiveTime
- period of time in milliseconds
-
isTrustAllSSLCertificates
public boolean isTrustAllSSLCertificates()
Whether to trust all SSL certificates (affects only "https" connections).- Returns:
true
if trusting all SSL certificates- Since:
- 1.3.0
-
setTrustAllSSLCertificates
public void setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
Sets whether to trust all SSL certificate. This is typically a bad idea (favors man-in-the-middle attacks) . Try to install a SSL certificate locally to ensure a proper certificate exchange instead.- Parameters:
trustAllSSLCertificates
-true
if trusting all SSL certificates- Since:
- 1.3.0
-
isDisableSNI
public boolean isDisableSNI()
Gets whether Server Name Indication (SNI) is disabled.- Returns:
true
if disabled
-
setDisableSNI
public void setDisableSNI(boolean disableSNI)
Sets whether Server Name Indication (SNI) is disabled.- Parameters:
disableSNI
-true
if disabled
-
getSSLProtocols
public List<String> getSSLProtocols()
Gets the supported SSL/TLS protocols. Default isnull
, which means it will use those provided/configured by your Java platform.- Returns:
- SSL/TLS protocols
-
setSSLProtocols
public void setSSLProtocols(List<String> sslProtocols)
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2. Note that specifying a protocol not supported by your underlying Java platform will not work.- Parameters:
sslProtocols
- SSL/TLS protocols supported
-
isDisableIfModifiedSince
public boolean isDisableIfModifiedSince()
Gets whether adding theIf-Modified-Since
HTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Returns:
true
if disabled
-
setDisableIfModifiedSince
public void setDisableIfModifiedSince(boolean disableIfModifiedSince)
Sets whether adding theIf-Modified-Since
HTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Parameters:
disableIfModifiedSince
-true
if disabled
-
isDisableETag
public boolean isDisableETag()
Gets whether adding "ETag"If-None-Match
HTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Returns:
true
if disabled
-
setDisableETag
public void setDisableETag(boolean disableETag)
Sets whether whether adding "ETag"If-None-Match
HTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Parameters:
disableETag
-true
if disabled
-
isDisableHSTS
public boolean isDisableHSTS()
Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Security
policy (obtained from HTTP response header).- Returns:
true
if disabled
-
setDisableHSTS
public void setDisableHSTS(boolean disableHSTS)
Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Security
policy (obtained from HTTP response header).- Parameters:
disableHSTS
-true
if disabled
-
getAuthConfig
public HttpAuthConfig getAuthConfig()
-
setAuthConfig
public void setAuthConfig(HttpAuthConfig authConfig)
-
getHttpMethods
public List<HttpMethod> getHttpMethods()
Gets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GET
andHttpMethod.HEAD
.- Returns:
- HTTP methods
-
setHttpMethods
public void setHttpMethods(List<HttpMethod> httpMethods)
Sets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GET
andHttpMethod.HEAD
.- Parameters:
httpMethods
- HTTP methods
-
loadFromXML
public void loadFromXML(XML xml)
- Specified by:
loadFromXML
in interfaceIXMLConfigurable
-
saveToXML
public void saveToXML(XML xml)
- Specified by:
saveToXML
in interfaceIXMLConfigurable
-
-