Class GenericHttpFetcherConfig
- java.lang.Object
-
- com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
- All Implemented Interfaces:
IXMLConfigurable
public class GenericHttpFetcherConfig extends Object implements IXMLConfigurable
Generic HTTP Fetcher configuration.- Since:
- 3.0.0 (adapted from GenericHttpClientFactory and GenericDocumentFetcher from version 2.x)
- Author:
- Pascal Essiembre
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_CONNECTIONSstatic intDEFAULT_MAX_CONNECTIONS_PER_ROUTEstatic intDEFAULT_MAX_IDLE_TIMEstatic intDEFAULT_MAX_REDIRECTstatic List<Integer>DEFAULT_NOT_FOUND_STATUS_CODESstatic intDEFAULT_TIMEOUTstatic List<Integer>DEFAULT_VALID_STATUS_CODES
-
Constructor Summary
Constructors Constructor Description GenericHttpFetcherConfig()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object obj)HttpAuthConfiggetAuthConfig()CharsetgetConnectionCharset()Gets the connection character set.intgetConnectionRequestTimeout()Gets the timeout when requesting a connection, in millisecondsintgetConnectionTimeout()Gets the connection timeout until a connection is established, in milliseconds.StringgetCookieSpec()StringgetHeadersPrefix()List<HttpMethod>getHttpMethods()Gets the list of HTTP methods to be accepted by this fetcher.StringgetLocalAddress()Gets the local address (IP or hostname).intgetMaxConnectionIdleTime()Gets the period of time in milliseconds after which to evict idle connections from the connection pool.intgetMaxConnectionInactiveTime()Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.intgetMaxConnections()Gets the maximum number of connections that can be created.intgetMaxConnectionsPerRoute()Gets the maximum number of connections that can be used per route.intgetMaxRedirects()Gets the maximum number of redirects to be followed.List<Integer>getNotFoundStatusCodes()Gets HTTP status codes to be considered as "Not found" state.ProxySettingsgetProxySettings()IRedirectURLProvidergetRedirectURLProvider()Gets the redirect URL provider.StringgetRequestHeader(String name)Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String).List<String>getRequestHeaderNames()Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String).intgetSocketTimeout()Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.List<String>getSSLProtocols()Gets the supported SSL/TLS protocols.StringgetUserAgent()List<Integer>getValidStatusCodes()inthashCode()booleanisDisableETag()Gets whether adding "ETag"If-None-MatchHTTP request header is disabled.booleanisDisableHSTS()Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).booleanisDisableIfModifiedSince()Gets whether adding theIf-Modified-SinceHTTP request header is disabled.booleanisDisableSNI()Gets whether Server Name Indication (SNI) is disabled.booleanisExpectContinueEnabled()Whether 'Expect: 100-continue' handshake is enabled.booleanisForceCharsetDetection()Gets whether character encoding is detected instead of relying on HTTP response header.booleanisForceContentTypeDetection()Gets whether content type is detected instead of relying on HTTP response header.booleanisTrustAllSSLCertificates()Whether to trust all SSL certificates (affects only "https" connections).voidloadFromXML(XML xml)StringremoveRequestHeader(String name)Remove the request header matching the given name.voidsaveToXML(XML xml)voidsetAuthConfig(HttpAuthConfig authConfig)voidsetConnectionCharset(Charset connectionCharset)Sets the connection character set.voidsetConnectionRequestTimeout(int connectionRequestTimeout)Sets the timeout when requesting a connection, in milliseconds.voidsetConnectionTimeout(int connectionTimeout)Sets the connection timeout until a connection is established, in milliseconds.voidsetCookieSpec(String cookieSpec)voidsetDisableETag(boolean disableETag)Sets whether whether adding "ETag"If-None-MatchHTTP request header is disabled.voidsetDisableHSTS(boolean disableHSTS)Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).voidsetDisableIfModifiedSince(boolean disableIfModifiedSince)Sets whether adding theIf-Modified-SinceHTTP request header is disabled.voidsetDisableSNI(boolean disableSNI)Sets whether Server Name Indication (SNI) is disabled.voidsetExpectContinueEnabled(boolean expectContinueEnabled)Sets whether 'Expect: 100-continue' handshake is enabled.voidsetForceCharsetDetection(boolean forceCharsetDetection)Sets whether character encoding is detected instead of relying on HTTP response header.voidsetForceContentTypeDetection(boolean forceContentTypeDetection)Sets whether content type is detected instead of relying on HTTP response header.voidsetHeadersPrefix(String headersPrefix)voidsetHttpMethods(List<HttpMethod> httpMethods)Sets the list of HTTP methods to be accepted by this fetcher.voidsetLocalAddress(String localAddress)Sets the local address, which may be useful when working with multiple network interfaces.voidsetMaxConnectionIdleTime(int maxConnectionIdleTime)Sets the period of time in milliseconds after which to evict idle connections from the connection pool.voidsetMaxConnectionInactiveTime(int maxConnectionInactiveTime)Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.voidsetMaxConnections(int maxConnections)Sets maximum number of connections that can be created.voidsetMaxConnectionsPerRoute(int maxConnectionsPerRoute)Sets the maximum number of connections that can be used per route.voidsetMaxRedirects(int maxRedirects)Sets the maximum number of redirects to be followed.voidsetNotFoundStatusCodes(int... notFoundStatusCodes)Sets HTTP status codes to be considered as "Not found" state.voidsetNotFoundStatusCodes(List<Integer> notFoundStatusCodes)Sets HTTP status codes to be considered as "Not found" state.voidsetProxySettings(ProxySettings proxy)voidsetRedirectURLProvider(IRedirectURLProvider redirectURLProvider)Sets the redirect URL providervoidsetRequestHeader(String name, String value)Sets a default HTTP request header every HTTP connection should have.voidsetRequestHeaders(Map<String,String> headers)Sets a default HTTP request headers every HTTP connection should have.voidsetSocketTimeout(int socketTimeout)Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.voidsetSSLProtocols(List<String> sslProtocols)Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.voidsetTrustAllSSLCertificates(boolean trustAllSSLCertificates)Sets whether to trust all SSL certificate.voidsetUserAgent(String userAgent)voidsetValidStatusCodes(int... validStatusCodes)Gets valid HTTP response status codes.voidsetValidStatusCodes(List<Integer> validStatusCodes)Gets valid HTTP response status codes.StringtoString()
-
-
-
Field Detail
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_REDIRECT
public static final int DEFAULT_MAX_REDIRECT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_CONNECTIONS
public static final int DEFAULT_MAX_CONNECTIONS
- See Also:
- Constant Field Values
-
DEFAULT_MAX_CONNECTIONS_PER_ROUTE
public static final int DEFAULT_MAX_CONNECTIONS_PER_ROUTE
- See Also:
- Constant Field Values
-
DEFAULT_MAX_IDLE_TIME
public static final int DEFAULT_MAX_IDLE_TIME
- See Also:
- Constant Field Values
-
-
Method Detail
-
getRedirectURLProvider
public IRedirectURLProvider getRedirectURLProvider()
Gets the redirect URL provider.- Returns:
- the redirect URL provider
-
setRedirectURLProvider
public void setRedirectURLProvider(IRedirectURLProvider redirectURLProvider)
Sets the redirect URL provider- Parameters:
redirectURLProvider- redirect URL provider
-
setValidStatusCodes
public void setValidStatusCodes(List<Integer> validStatusCodes)
Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes
-
setValidStatusCodes
public void setValidStatusCodes(int... validStatusCodes)
Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes
-
getNotFoundStatusCodes
public List<Integer> getNotFoundStatusCodes()
Gets HTTP status codes to be considered as "Not found" state. Default is 404.- Returns:
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes
-
getHeadersPrefix
public String getHeadersPrefix()
-
setHeadersPrefix
public void setHeadersPrefix(String headersPrefix)
-
isForceContentTypeDetection
public boolean isForceContentTypeDetection()
Gets whether content type is detected instead of relying on HTTP response header.- Returns:
trueto enable detection
-
setForceContentTypeDetection
public void setForceContentTypeDetection(boolean forceContentTypeDetection)
Sets whether content type is detected instead of relying on HTTP response header.- Parameters:
forceContentTypeDetection-trueto enable detection
-
isForceCharsetDetection
public boolean isForceCharsetDetection()
Gets whether character encoding is detected instead of relying on HTTP response header.- Returns:
trueto enable detection
-
setForceCharsetDetection
public void setForceCharsetDetection(boolean forceCharsetDetection)
Sets whether character encoding is detected instead of relying on HTTP response header.- Parameters:
forceCharsetDetection-trueto enable detection
-
getUserAgent
public String getUserAgent()
-
setUserAgent
public void setUserAgent(String userAgent)
-
setRequestHeader
public void setRequestHeader(String name, String value)
Sets a default HTTP request header every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
name- HTTP request header namevalue- HTTP request header value
-
setRequestHeaders
public void setRequestHeaders(Map<String,String> headers)
Sets a default HTTP request headers every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
headers- map of header names and values
-
getRequestHeader
public String getRequestHeader(String name)
Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String).- Parameters:
name- HTTP request header name- Returns:
- HTTP request header value or
nullif no match is found
-
getRequestHeaderNames
public List<String> getRequestHeaderNames()
Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String). If no request headers are set, it returns an empty array.- Returns:
- HTTP request header names
-
removeRequestHeader
public String removeRequestHeader(String name)
Remove the request header matching the given name.- Parameters:
name- name of HTTP request header to remove- Returns:
- the previous value associated with the name, or
nullif there was no request header for the name.
-
getCookieSpec
public String getCookieSpec()
- Returns:
- the cookieSpec to use as defined in
CookieSpecs
-
setCookieSpec
public void setCookieSpec(String cookieSpec)
- Parameters:
cookieSpec- the cookieSpec to use as defined inCookieSpecs
-
getProxySettings
public ProxySettings getProxySettings()
-
setProxySettings
public void setProxySettings(ProxySettings proxy)
-
getConnectionTimeout
public int getConnectionTimeout()
Gets the connection timeout until a connection is established, in milliseconds.- Returns:
- connection timeout
-
setConnectionTimeout
public void setConnectionTimeout(int connectionTimeout)
Sets the connection timeout until a connection is established, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
connectionTimeout- connection timeout
-
getSocketTimeout
public int getSocketTimeout()
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.- Returns:
- connection timeout
-
setSocketTimeout
public void setSocketTimeout(int socketTimeout)
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
socketTimeout- socket timeout
-
getConnectionRequestTimeout
public int getConnectionRequestTimeout()
Gets the timeout when requesting a connection, in milliseconds- Returns:
- connection timeout
-
setConnectionRequestTimeout
public void setConnectionRequestTimeout(int connectionRequestTimeout)
Sets the timeout when requesting a connection, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
connectionRequestTimeout- connection request timeout
-
getConnectionCharset
public Charset getConnectionCharset()
Gets the connection character set.- Returns:
- connection character set
-
setConnectionCharset
public void setConnectionCharset(Charset connectionCharset)
Sets the connection character set. The HTTP protocol specification mandates the use of ASCII for HTTP message headers. Sites do not always respect this and it may be necessary to force a non-standard character set.- Parameters:
connectionCharset- connection character set
-
isExpectContinueEnabled
public boolean isExpectContinueEnabled()
Whether 'Expect: 100-continue' handshake is enabled.- Returns:
trueif enabled
-
setExpectContinueEnabled
public void setExpectContinueEnabled(boolean expectContinueEnabled)
Sets whether 'Expect: 100-continue' handshake is enabled. SeeRequestConfig.isExpectContinueEnabled()- Parameters:
expectContinueEnabled-trueif enabled
-
getMaxRedirects
public int getMaxRedirects()
Gets the maximum number of redirects to be followed.- Returns:
- maximum number of redirects to be followed
-
setMaxRedirects
public void setMaxRedirects(int maxRedirects)
Sets the maximum number of redirects to be followed. This can help prevent infinite loops. A value of zero effectively disables redirects. Default isDEFAULT_MAX_REDIRECT.- Parameters:
maxRedirects- maximum number of redirects to be followed
-
getLocalAddress
public String getLocalAddress()
Gets the local address (IP or hostname).- Returns:
- local address
-
setLocalAddress
public void setLocalAddress(String localAddress)
Sets the local address, which may be useful when working with multiple network interfaces.- Parameters:
localAddress- locale address
-
getMaxConnections
public int getMaxConnections()
Gets the maximum number of connections that can be created.- Returns:
- number of connections
-
setMaxConnections
public void setMaxConnections(int maxConnections)
Sets maximum number of connections that can be created. Typically, you would have at least the same amount as threads. Default isDEFAULT_MAX_CONNECTIONS.- Parameters:
maxConnections- maximum number of connections
-
getMaxConnectionsPerRoute
public int getMaxConnectionsPerRoute()
Gets the maximum number of connections that can be used per route.- Returns:
- number of connections per route
-
setMaxConnectionsPerRoute
public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
Sets the maximum number of connections that can be used per route. Default isDEFAULT_MAX_CONNECTIONS_PER_ROUTE.- Parameters:
maxConnectionsPerRoute- maximum number of connections per route
-
getMaxConnectionIdleTime
public int getMaxConnectionIdleTime()
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.- Returns:
- amount of time after which to evict idle connections
-
setMaxConnectionIdleTime
public void setMaxConnectionIdleTime(int maxConnectionIdleTime)
Sets the period of time in milliseconds after which to evict idle connections from the connection pool. Default isDEFAULT_MAX_IDLE_TIME.- Parameters:
maxConnectionIdleTime- amount of time after which to evict idle connections
-
getMaxConnectionInactiveTime
public int getMaxConnectionInactiveTime()
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.- Returns:
- period of time in milliseconds
-
setMaxConnectionInactiveTime
public void setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled. Default is 0 (not proactively checked).- Parameters:
maxConnectionInactiveTime- period of time in milliseconds
-
isTrustAllSSLCertificates
public boolean isTrustAllSSLCertificates()
Whether to trust all SSL certificates (affects only "https" connections).- Returns:
trueif trusting all SSL certificates- Since:
- 1.3.0
-
setTrustAllSSLCertificates
public void setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
Sets whether to trust all SSL certificate. This is typically a bad idea (favors man-in-the-middle attacks) . Try to install a SSL certificate locally to ensure a proper certificate exchange instead.- Parameters:
trustAllSSLCertificates-trueif trusting all SSL certificates- Since:
- 1.3.0
-
isDisableSNI
public boolean isDisableSNI()
Gets whether Server Name Indication (SNI) is disabled.- Returns:
trueif disabled
-
setDisableSNI
public void setDisableSNI(boolean disableSNI)
Sets whether Server Name Indication (SNI) is disabled.- Parameters:
disableSNI-trueif disabled
-
getSSLProtocols
public List<String> getSSLProtocols()
Gets the supported SSL/TLS protocols. Default isnull, which means it will use those provided/configured by your Java platform.- Returns:
- SSL/TLS protocols
-
setSSLProtocols
public void setSSLProtocols(List<String> sslProtocols)
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2. Note that specifying a protocol not supported by your underlying Java platform will not work.- Parameters:
sslProtocols- SSL/TLS protocols supported
-
isDisableIfModifiedSince
public boolean isDisableIfModifiedSince()
Gets whether adding theIf-Modified-SinceHTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Returns:
trueif disabled
-
setDisableIfModifiedSince
public void setDisableIfModifiedSince(boolean disableIfModifiedSince)
Sets whether adding theIf-Modified-SinceHTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Parameters:
disableIfModifiedSince-trueif disabled
-
isDisableETag
public boolean isDisableETag()
Gets whether adding "ETag"If-None-MatchHTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Returns:
trueif disabled
-
setDisableETag
public void setDisableETag(boolean disableETag)
Sets whether whether adding "ETag"If-None-MatchHTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Parameters:
disableETag-trueif disabled
-
isDisableHSTS
public boolean isDisableHSTS()
Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).- Returns:
trueif disabled
-
setDisableHSTS
public void setDisableHSTS(boolean disableHSTS)
Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).- Parameters:
disableHSTS-trueif disabled
-
getAuthConfig
public HttpAuthConfig getAuthConfig()
-
setAuthConfig
public void setAuthConfig(HttpAuthConfig authConfig)
-
getHttpMethods
public List<HttpMethod> getHttpMethods()
Gets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GETandHttpMethod.HEAD.- Returns:
- HTTP methods
-
setHttpMethods
public void setHttpMethods(List<HttpMethod> httpMethods)
Sets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GETandHttpMethod.HEAD.- Parameters:
httpMethods- HTTP methods
-
loadFromXML
public void loadFromXML(XML xml)
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
saveToXML
public void saveToXML(XML xml)
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
-