Class GenericHttpFetcherConfig
java.lang.Object
com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- All Implemented Interfaces:
IXMLConfigurable
Generic HTTP Fetcher configuration.
- Since:
- 3.0.0 (adapted from GenericHttpClientFactory and GenericDocumentFetcher from version 2.x)
- Author:
- Pascal Essiembre
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final intstatic final intstatic final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleanGets the connection character set.intGets the timeout when requesting a connection, in millisecondsintGets the connection timeout until a connection is established, in milliseconds.Gets the list of HTTP methods to be accepted by this fetcher.Gets the local address (IP or hostname).intGets the period of time in milliseconds after which to evict idle connections from the connection pool.intGets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.intGets the maximum number of connections that can be created.intGets the maximum number of connections that can be used per route.intGets the maximum number of redirects to be followed.Gets HTTP status codes to be considered as "Not found" state.Gets the redirect URL provider.getRequestHeader(String name) Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String).Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String).intGets the maximum period of inactivity between two consecutive data packets, in milliseconds.Gets the supported SSL/TLS protocols.inthashCode()booleanGets whether adding "ETag"If-None-MatchHTTP request header is disabled.booleanGets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).booleanGets whether adding theIf-Modified-SinceHTTP request header is disabled.booleanGets whether Server Name Indication (SNI) is disabled.booleanWhether 'Expect: 100-continue' handshake is enabled.booleanGets whether character encoding is detected instead of relying on HTTP response header.booleanGets whether content type is detected instead of relying on HTTP response header.booleanWhether to trust all SSL certificates (affects only "https" connections).voidloadFromXML(XML xml) removeRequestHeader(String name) Remove the request header matching the given name.voidvoidsetAuthConfig(HttpAuthConfig authConfig) voidsetConnectionCharset(Charset connectionCharset) Sets the connection character set.voidsetConnectionRequestTimeout(int connectionRequestTimeout) Sets the timeout when requesting a connection, in milliseconds.voidsetConnectionTimeout(int connectionTimeout) Sets the connection timeout until a connection is established, in milliseconds.voidsetCookieSpec(String cookieSpec) voidsetDisableETag(boolean disableETag) Sets whether whether adding "ETag"If-None-MatchHTTP request header is disabled.voidsetDisableHSTS(boolean disableHSTS) Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).voidsetDisableIfModifiedSince(boolean disableIfModifiedSince) Sets whether adding theIf-Modified-SinceHTTP request header is disabled.voidsetDisableSNI(boolean disableSNI) Sets whether Server Name Indication (SNI) is disabled.voidsetExpectContinueEnabled(boolean expectContinueEnabled) Sets whether 'Expect: 100-continue' handshake is enabled.voidsetForceCharsetDetection(boolean forceCharsetDetection) Sets whether character encoding is detected instead of relying on HTTP response header.voidsetForceContentTypeDetection(boolean forceContentTypeDetection) Sets whether content type is detected instead of relying on HTTP response header.voidsetHeadersPrefix(String headersPrefix) voidsetHttpMethods(List<HttpMethod> httpMethods) Sets the list of HTTP methods to be accepted by this fetcher.voidsetLocalAddress(String localAddress) Sets the local address, which may be useful when working with multiple network interfaces.voidsetMaxConnectionIdleTime(int maxConnectionIdleTime) Sets the period of time in milliseconds after which to evict idle connections from the connection pool.voidsetMaxConnectionInactiveTime(int maxConnectionInactiveTime) Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.voidsetMaxConnections(int maxConnections) Sets maximum number of connections that can be created.voidsetMaxConnectionsPerRoute(int maxConnectionsPerRoute) Sets the maximum number of connections that can be used per route.voidsetMaxRedirects(int maxRedirects) Sets the maximum number of redirects to be followed.final voidsetNotFoundStatusCodes(int... notFoundStatusCodes) Sets HTTP status codes to be considered as "Not found" state.final voidsetNotFoundStatusCodes(List<Integer> notFoundStatusCodes) Sets HTTP status codes to be considered as "Not found" state.voidsetProxySettings(ProxySettings proxy) voidsetRedirectURLProvider(IRedirectURLProvider redirectURLProvider) Sets the redirect URL providervoidsetRequestHeader(String name, String value) Sets a default HTTP request header every HTTP connection should have.voidsetRequestHeaders(Map<String, String> headers) Sets a default HTTP request headers every HTTP connection should have.voidsetSocketTimeout(int socketTimeout) Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.voidsetSSLProtocols(List<String> sslProtocols) Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.voidsetTrustAllSSLCertificates(boolean trustAllSSLCertificates) Sets whether to trust all SSL certificate.voidsetUserAgent(String userAgent) voidsetValidStatusCodes(int... validStatusCodes) Gets valid HTTP response status codes.voidsetValidStatusCodes(List<Integer> validStatusCodes) Gets valid HTTP response status codes.toString()
-
Field Details
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT- See Also:
-
DEFAULT_MAX_REDIRECT
public static final int DEFAULT_MAX_REDIRECT- See Also:
-
DEFAULT_MAX_CONNECTIONS
public static final int DEFAULT_MAX_CONNECTIONS- See Also:
-
DEFAULT_MAX_CONNECTIONS_PER_ROUTE
public static final int DEFAULT_MAX_CONNECTIONS_PER_ROUTE- See Also:
-
DEFAULT_MAX_IDLE_TIME
public static final int DEFAULT_MAX_IDLE_TIME- See Also:
-
DEFAULT_VALID_STATUS_CODES
-
DEFAULT_NOT_FOUND_STATUS_CODES
-
-
Constructor Details
-
GenericHttpFetcherConfig
public GenericHttpFetcherConfig()
-
-
Method Details
-
getRedirectURLProvider
Gets the redirect URL provider.- Returns:
- the redirect URL provider
-
setRedirectURLProvider
Sets the redirect URL provider- Parameters:
redirectURLProvider- redirect URL provider
-
getValidStatusCodes
-
setValidStatusCodes
Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes
-
setValidStatusCodes
public void setValidStatusCodes(int... validStatusCodes) Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes
-
getNotFoundStatusCodes
Gets HTTP status codes to be considered as "Not found" state. Default is 404.- Returns:
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(int... notFoundStatusCodes) Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes
-
setNotFoundStatusCodes
Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes
-
getHeadersPrefix
-
setHeadersPrefix
-
isForceContentTypeDetection
public boolean isForceContentTypeDetection()Gets whether content type is detected instead of relying on HTTP response header.- Returns:
trueto enable detection
-
setForceContentTypeDetection
public void setForceContentTypeDetection(boolean forceContentTypeDetection) Sets whether content type is detected instead of relying on HTTP response header.- Parameters:
forceContentTypeDetection-trueto enable detection
-
isForceCharsetDetection
public boolean isForceCharsetDetection()Gets whether character encoding is detected instead of relying on HTTP response header.- Returns:
trueto enable detection
-
setForceCharsetDetection
public void setForceCharsetDetection(boolean forceCharsetDetection) Sets whether character encoding is detected instead of relying on HTTP response header.- Parameters:
forceCharsetDetection-trueto enable detection
-
getUserAgent
-
setUserAgent
-
setRequestHeader
Sets a default HTTP request header every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
name- HTTP request header namevalue- HTTP request header value
-
setRequestHeaders
Sets a default HTTP request headers every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.- Parameters:
headers- map of header names and values
-
getRequestHeader
Gets the HTTP request header value matching the given name, previously set withsetRequestHeader(String, String).- Parameters:
name- HTTP request header name- Returns:
- HTTP request header value or
nullif no match is found
-
getRequestHeaderNames
Gets all HTTP request header names for headers previously set withsetRequestHeader(String, String). If no request headers are set, it returns an empty array.- Returns:
- HTTP request header names
-
removeRequestHeader
Remove the request header matching the given name.- Parameters:
name- name of HTTP request header to remove- Returns:
- the previous value associated with the name, or
nullif there was no request header for the name.
-
getCookieSpec
- Returns:
- the cookieSpec to use as defined in
CookieSpecs
-
setCookieSpec
- Parameters:
cookieSpec- the cookieSpec to use as defined inCookieSpecs
-
getProxySettings
-
setProxySettings
-
getConnectionTimeout
public int getConnectionTimeout()Gets the connection timeout until a connection is established, in milliseconds.- Returns:
- connection timeout
-
setConnectionTimeout
public void setConnectionTimeout(int connectionTimeout) Sets the connection timeout until a connection is established, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
connectionTimeout- connection timeout
-
getSocketTimeout
public int getSocketTimeout()Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.- Returns:
- connection timeout
-
setSocketTimeout
public void setSocketTimeout(int socketTimeout) Sets the maximum period of inactivity between two consecutive data packets, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
socketTimeout- socket timeout
-
getConnectionRequestTimeout
public int getConnectionRequestTimeout()Gets the timeout when requesting a connection, in milliseconds- Returns:
- connection timeout
-
setConnectionRequestTimeout
public void setConnectionRequestTimeout(int connectionRequestTimeout) Sets the timeout when requesting a connection, in milliseconds. Default isDEFAULT_TIMEOUT.- Parameters:
connectionRequestTimeout- connection request timeout
-
getConnectionCharset
Gets the connection character set.- Returns:
- connection character set
-
setConnectionCharset
Sets the connection character set. The HTTP protocol specification mandates the use of ASCII for HTTP message headers. Sites do not always respect this and it may be necessary to force a non-standard character set.- Parameters:
connectionCharset- connection character set
-
isExpectContinueEnabled
public boolean isExpectContinueEnabled()Whether 'Expect: 100-continue' handshake is enabled.- Returns:
trueif enabled
-
setExpectContinueEnabled
public void setExpectContinueEnabled(boolean expectContinueEnabled) Sets whether 'Expect: 100-continue' handshake is enabled. SeeRequestConfig.isExpectContinueEnabled()- Parameters:
expectContinueEnabled-trueif enabled
-
getMaxRedirects
public int getMaxRedirects()Gets the maximum number of redirects to be followed.- Returns:
- maximum number of redirects to be followed
-
setMaxRedirects
public void setMaxRedirects(int maxRedirects) Sets the maximum number of redirects to be followed. This can help prevent infinite loops. A value of zero effectively disables redirects. Default isDEFAULT_MAX_REDIRECT.- Parameters:
maxRedirects- maximum number of redirects to be followed
-
getLocalAddress
Gets the local address (IP or hostname).- Returns:
- local address
-
setLocalAddress
Sets the local address, which may be useful when working with multiple network interfaces.- Parameters:
localAddress- locale address
-
getMaxConnections
public int getMaxConnections()Gets the maximum number of connections that can be created.- Returns:
- number of connections
-
setMaxConnections
public void setMaxConnections(int maxConnections) Sets maximum number of connections that can be created. Typically, you would have at least the same amount as threads. Default isDEFAULT_MAX_CONNECTIONS.- Parameters:
maxConnections- maximum number of connections
-
getMaxConnectionsPerRoute
public int getMaxConnectionsPerRoute()Gets the maximum number of connections that can be used per route.- Returns:
- number of connections per route
-
setMaxConnectionsPerRoute
public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute) Sets the maximum number of connections that can be used per route. Default isDEFAULT_MAX_CONNECTIONS_PER_ROUTE.- Parameters:
maxConnectionsPerRoute- maximum number of connections per route
-
getMaxConnectionIdleTime
public int getMaxConnectionIdleTime()Gets the period of time in milliseconds after which to evict idle connections from the connection pool.- Returns:
- amount of time after which to evict idle connections
-
setMaxConnectionIdleTime
public void setMaxConnectionIdleTime(int maxConnectionIdleTime) Sets the period of time in milliseconds after which to evict idle connections from the connection pool. Default isDEFAULT_MAX_IDLE_TIME.- Parameters:
maxConnectionIdleTime- amount of time after which to evict idle connections
-
getMaxConnectionInactiveTime
public int getMaxConnectionInactiveTime()Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.- Returns:
- period of time in milliseconds
-
setMaxConnectionInactiveTime
public void setMaxConnectionInactiveTime(int maxConnectionInactiveTime) Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled. Default is 0 (not proactively checked).- Parameters:
maxConnectionInactiveTime- period of time in milliseconds
-
isTrustAllSSLCertificates
public boolean isTrustAllSSLCertificates()Whether to trust all SSL certificates (affects only "https" connections).- Returns:
trueif trusting all SSL certificates- Since:
- 1.3.0
-
setTrustAllSSLCertificates
public void setTrustAllSSLCertificates(boolean trustAllSSLCertificates) Sets whether to trust all SSL certificate. This is typically a bad idea (favors man-in-the-middle attacks) . Try to install a SSL certificate locally to ensure a proper certificate exchange instead.- Parameters:
trustAllSSLCertificates-trueif trusting all SSL certificates- Since:
- 1.3.0
-
isDisableSNI
public boolean isDisableSNI()Gets whether Server Name Indication (SNI) is disabled.- Returns:
trueif disabled
-
setDisableSNI
public void setDisableSNI(boolean disableSNI) Sets whether Server Name Indication (SNI) is disabled.- Parameters:
disableSNI-trueif disabled
-
getSSLProtocols
Gets the supported SSL/TLS protocols. Default isnull, which means it will use those provided/configured by your Java platform.- Returns:
- SSL/TLS protocols
-
setSSLProtocols
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2. Note that specifying a protocol not supported by your underlying Java platform will not work.- Parameters:
sslProtocols- SSL/TLS protocols supported
-
isDisableIfModifiedSince
public boolean isDisableIfModifiedSince()Gets whether adding theIf-Modified-SinceHTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Returns:
trueif disabled
-
setDisableIfModifiedSince
public void setDisableIfModifiedSince(boolean disableIfModifiedSince) Sets whether adding theIf-Modified-SinceHTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.- Parameters:
disableIfModifiedSince-trueif disabled
-
isDisableETag
public boolean isDisableETag()Gets whether adding "ETag"If-None-MatchHTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Returns:
trueif disabled
-
setDisableETag
public void setDisableETag(boolean disableETag) Sets whether whether adding "ETag"If-None-MatchHTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.- Parameters:
disableETag-trueif disabled
-
isDisableHSTS
public boolean isDisableHSTS()Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).- Returns:
trueif disabled
-
setDisableHSTS
public void setDisableHSTS(boolean disableHSTS) Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domainStrict-Transport-Securitypolicy (obtained from HTTP response header).- Parameters:
disableHSTS-trueif disabled
-
getAuthConfig
-
setAuthConfig
-
getHttpMethods
Gets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GETandHttpMethod.HEAD.- Returns:
- HTTP methods
-
setHttpMethods
Sets the list of HTTP methods to be accepted by this fetcher. Defaults areHttpMethod.GETandHttpMethod.HEAD.- Parameters:
httpMethods- HTTP methods
-
loadFromXML
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
saveToXML
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
equals
-
hashCode
public int hashCode() -
toString
-