Class GenericHttpFetcherConfig

java.lang.Object
com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
All Implemented Interfaces:
IXMLConfigurable

public class GenericHttpFetcherConfig extends Object implements IXMLConfigurable
Generic HTTP Fetcher configuration.
Since:
3.0.0 (adapted from GenericHttpClientFactory and GenericDocumentFetcher from version 2.x)
Author:
Pascal Essiembre
  • Field Details

    • DEFAULT_TIMEOUT

      public static final int DEFAULT_TIMEOUT
      See Also:
    • DEFAULT_MAX_REDIRECT

      public static final int DEFAULT_MAX_REDIRECT
      See Also:
    • DEFAULT_MAX_CONNECTIONS

      public static final int DEFAULT_MAX_CONNECTIONS
      See Also:
    • DEFAULT_MAX_CONNECTIONS_PER_ROUTE

      public static final int DEFAULT_MAX_CONNECTIONS_PER_ROUTE
      See Also:
    • DEFAULT_MAX_IDLE_TIME

      public static final int DEFAULT_MAX_IDLE_TIME
      See Also:
    • DEFAULT_VALID_STATUS_CODES

      public static final List<Integer> DEFAULT_VALID_STATUS_CODES
    • DEFAULT_NOT_FOUND_STATUS_CODES

      public static final List<Integer> DEFAULT_NOT_FOUND_STATUS_CODES
  • Constructor Details

    • GenericHttpFetcherConfig

      public GenericHttpFetcherConfig()
  • Method Details

    • getRedirectURLProvider

      public IRedirectURLProvider getRedirectURLProvider()
      Gets the redirect URL provider.
      Returns:
      the redirect URL provider
    • setRedirectURLProvider

      public void setRedirectURLProvider(IRedirectURLProvider redirectURLProvider)
      Sets the redirect URL provider
      Parameters:
      redirectURLProvider - redirect URL provider
    • getValidStatusCodes

      public List<Integer> getValidStatusCodes()
    • setValidStatusCodes

      public void setValidStatusCodes(List<Integer> validStatusCodes)
      Gets valid HTTP response status codes.
      Parameters:
      validStatusCodes - valid status codes
    • setValidStatusCodes

      public void setValidStatusCodes(int... validStatusCodes)
      Gets valid HTTP response status codes.
      Parameters:
      validStatusCodes - valid status codes
    • getNotFoundStatusCodes

      public List<Integer> getNotFoundStatusCodes()
      Gets HTTP status codes to be considered as "Not found" state. Default is 404.
      Returns:
      "Not found" codes
    • setNotFoundStatusCodes

      public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
      Sets HTTP status codes to be considered as "Not found" state.
      Parameters:
      notFoundStatusCodes - "Not found" codes
    • setNotFoundStatusCodes

      public final void setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
      Sets HTTP status codes to be considered as "Not found" state.
      Parameters:
      notFoundStatusCodes - "Not found" codes
    • getHeadersPrefix

      public String getHeadersPrefix()
    • setHeadersPrefix

      public void setHeadersPrefix(String headersPrefix)
    • isForceContentTypeDetection

      public boolean isForceContentTypeDetection()
      Gets whether content type is detected instead of relying on HTTP response header.
      Returns:
      true to enable detection
    • setForceContentTypeDetection

      public void setForceContentTypeDetection(boolean forceContentTypeDetection)
      Sets whether content type is detected instead of relying on HTTP response header.
      Parameters:
      forceContentTypeDetection - true to enable detection
    • isForceCharsetDetection

      public boolean isForceCharsetDetection()
      Gets whether character encoding is detected instead of relying on HTTP response header.
      Returns:
      true to enable detection
    • setForceCharsetDetection

      public void setForceCharsetDetection(boolean forceCharsetDetection)
      Sets whether character encoding is detected instead of relying on HTTP response header.
      Parameters:
      forceCharsetDetection - true to enable detection
    • getUserAgent

      public String getUserAgent()
    • setUserAgent

      public void setUserAgent(String userAgent)
    • setRequestHeader

      public void setRequestHeader(String name, String value)
      Sets a default HTTP request header every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.
      Parameters:
      name - HTTP request header name
      value - HTTP request header value
    • setRequestHeaders

      public void setRequestHeaders(Map<String,String> headers)
      Sets a default HTTP request headers every HTTP connection should have. Those are in addition to any default request headers Apache HttpClient may already provide.
      Parameters:
      headers - map of header names and values
    • getRequestHeader

      public String getRequestHeader(String name)
      Gets the HTTP request header value matching the given name, previously set with setRequestHeader(String, String).
      Parameters:
      name - HTTP request header name
      Returns:
      HTTP request header value or null if no match is found
    • getRequestHeaderNames

      public List<String> getRequestHeaderNames()
      Gets all HTTP request header names for headers previously set with setRequestHeader(String, String). If no request headers are set, it returns an empty array.
      Returns:
      HTTP request header names
    • removeRequestHeader

      public String removeRequestHeader(String name)
      Remove the request header matching the given name.
      Parameters:
      name - name of HTTP request header to remove
      Returns:
      the previous value associated with the name, or null if there was no request header for the name.
    • getCookieSpec

      public String getCookieSpec()
      Returns:
      the cookieSpec to use as defined in CookieSpecs
    • setCookieSpec

      public void setCookieSpec(String cookieSpec)
      Parameters:
      cookieSpec - the cookieSpec to use as defined in CookieSpecs
    • getProxySettings

      public ProxySettings getProxySettings()
    • setProxySettings

      public void setProxySettings(ProxySettings proxy)
    • getConnectionTimeout

      public int getConnectionTimeout()
      Gets the connection timeout until a connection is established, in milliseconds.
      Returns:
      connection timeout
    • setConnectionTimeout

      public void setConnectionTimeout(int connectionTimeout)
      Sets the connection timeout until a connection is established, in milliseconds. Default is DEFAULT_TIMEOUT.
      Parameters:
      connectionTimeout - connection timeout
    • getSocketTimeout

      public int getSocketTimeout()
      Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.
      Returns:
      connection timeout
    • setSocketTimeout

      public void setSocketTimeout(int socketTimeout)
      Sets the maximum period of inactivity between two consecutive data packets, in milliseconds. Default is DEFAULT_TIMEOUT.
      Parameters:
      socketTimeout - socket timeout
    • getConnectionRequestTimeout

      public int getConnectionRequestTimeout()
      Gets the timeout when requesting a connection, in milliseconds
      Returns:
      connection timeout
    • setConnectionRequestTimeout

      public void setConnectionRequestTimeout(int connectionRequestTimeout)
      Sets the timeout when requesting a connection, in milliseconds. Default is DEFAULT_TIMEOUT.
      Parameters:
      connectionRequestTimeout - connection request timeout
    • getConnectionCharset

      public Charset getConnectionCharset()
      Gets the connection character set.
      Returns:
      connection character set
    • setConnectionCharset

      public void setConnectionCharset(Charset connectionCharset)
      Sets the connection character set. The HTTP protocol specification mandates the use of ASCII for HTTP message headers. Sites do not always respect this and it may be necessary to force a non-standard character set.
      Parameters:
      connectionCharset - connection character set
    • isExpectContinueEnabled

      public boolean isExpectContinueEnabled()
      Whether 'Expect: 100-continue' handshake is enabled.
      Returns:
      true if enabled
    • setExpectContinueEnabled

      public void setExpectContinueEnabled(boolean expectContinueEnabled)
      Sets whether 'Expect: 100-continue' handshake is enabled. See RequestConfig.isExpectContinueEnabled()
      Parameters:
      expectContinueEnabled - true if enabled
    • getMaxRedirects

      public int getMaxRedirects()
      Gets the maximum number of redirects to be followed.
      Returns:
      maximum number of redirects to be followed
    • setMaxRedirects

      public void setMaxRedirects(int maxRedirects)
      Sets the maximum number of redirects to be followed. This can help prevent infinite loops. A value of zero effectively disables redirects. Default is DEFAULT_MAX_REDIRECT.
      Parameters:
      maxRedirects - maximum number of redirects to be followed
    • getLocalAddress

      public String getLocalAddress()
      Gets the local address (IP or hostname).
      Returns:
      local address
    • setLocalAddress

      public void setLocalAddress(String localAddress)
      Sets the local address, which may be useful when working with multiple network interfaces.
      Parameters:
      localAddress - locale address
    • getMaxConnections

      public int getMaxConnections()
      Gets the maximum number of connections that can be created.
      Returns:
      number of connections
    • setMaxConnections

      public void setMaxConnections(int maxConnections)
      Sets maximum number of connections that can be created. Typically, you would have at least the same amount as threads. Default is DEFAULT_MAX_CONNECTIONS.
      Parameters:
      maxConnections - maximum number of connections
    • getMaxConnectionsPerRoute

      public int getMaxConnectionsPerRoute()
      Gets the maximum number of connections that can be used per route.
      Returns:
      number of connections per route
    • setMaxConnectionsPerRoute

      public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
      Sets the maximum number of connections that can be used per route. Default is DEFAULT_MAX_CONNECTIONS_PER_ROUTE.
      Parameters:
      maxConnectionsPerRoute - maximum number of connections per route
    • getMaxConnectionIdleTime

      public int getMaxConnectionIdleTime()
      Gets the period of time in milliseconds after which to evict idle connections from the connection pool.
      Returns:
      amount of time after which to evict idle connections
    • setMaxConnectionIdleTime

      public void setMaxConnectionIdleTime(int maxConnectionIdleTime)
      Sets the period of time in milliseconds after which to evict idle connections from the connection pool. Default is DEFAULT_MAX_IDLE_TIME.
      Parameters:
      maxConnectionIdleTime - amount of time after which to evict idle connections
    • getMaxConnectionInactiveTime

      public int getMaxConnectionInactiveTime()
      Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
      Returns:
      period of time in milliseconds
    • setMaxConnectionInactiveTime

      public void setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
      Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled. Default is 0 (not proactively checked).
      Parameters:
      maxConnectionInactiveTime - period of time in milliseconds
    • isTrustAllSSLCertificates

      public boolean isTrustAllSSLCertificates()
      Whether to trust all SSL certificates (affects only "https" connections).
      Returns:
      true if trusting all SSL certificates
      Since:
      1.3.0
    • setTrustAllSSLCertificates

      public void setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
      Sets whether to trust all SSL certificate. This is typically a bad idea (favors man-in-the-middle attacks) . Try to install a SSL certificate locally to ensure a proper certificate exchange instead.
      Parameters:
      trustAllSSLCertificates - true if trusting all SSL certificates
      Since:
      1.3.0
    • isDisableSNI

      public boolean isDisableSNI()
      Gets whether Server Name Indication (SNI) is disabled.
      Returns:
      true if disabled
    • setDisableSNI

      public void setDisableSNI(boolean disableSNI)
      Sets whether Server Name Indication (SNI) is disabled.
      Parameters:
      disableSNI - true if disabled
    • getSSLProtocols

      public List<String> getSSLProtocols()
      Gets the supported SSL/TLS protocols. Default is null, which means it will use those provided/configured by your Java platform.
      Returns:
      SSL/TLS protocols
    • setSSLProtocols

      public void setSSLProtocols(List<String> sslProtocols)
      Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2. Note that specifying a protocol not supported by your underlying Java platform will not work.
      Parameters:
      sslProtocols - SSL/TLS protocols supported
    • isDisableIfModifiedSince

      public boolean isDisableIfModifiedSince()
      Gets whether adding the If-Modified-Since HTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.
      Returns:
      true if disabled
    • setDisableIfModifiedSince

      public void setDisableIfModifiedSince(boolean disableIfModifiedSince)
      Sets whether adding the If-Modified-Since HTTP request header is disabled. Servers supporting this header will only return the requested document if it was last modified since the supplied date.
      Parameters:
      disableIfModifiedSince - true if disabled
    • isDisableETag

      public boolean isDisableETag()
      Gets whether adding "ETag" If-None-Match HTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.
      Returns:
      true if disabled
    • setDisableETag

      public void setDisableETag(boolean disableETag)
      Sets whether whether adding "ETag" If-None-Match HTTP request header is disabled. Servers supporting this header will only return the requested document if the ETag value has changed, indicating a more recent version is available.
      Parameters:
      disableETag - true if disabled
    • isDisableHSTS

      public boolean isDisableHSTS()
      Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain Strict-Transport-Security policy (obtained from HTTP response header).
      Returns:
      true if disabled
    • setDisableHSTS

      public void setDisableHSTS(boolean disableHSTS)
      Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain Strict-Transport-Security policy (obtained from HTTP response header).
      Parameters:
      disableHSTS - true if disabled
    • getAuthConfig

      public HttpAuthConfig getAuthConfig()
    • setAuthConfig

      public void setAuthConfig(HttpAuthConfig authConfig)
    • getHttpMethods

      public List<HttpMethod> getHttpMethods()
      Gets the list of HTTP methods to be accepted by this fetcher. Defaults are HttpMethod.GET and HttpMethod.HEAD.
      Returns:
      HTTP methods
    • setHttpMethods

      public void setHttpMethods(List<HttpMethod> httpMethods)
      Sets the list of HTTP methods to be accepted by this fetcher. Defaults are HttpMethod.GET and HttpMethod.HEAD.
      Parameters:
      httpMethods - HTTP methods
    • loadFromXML

      public void loadFromXML(XML xml)
      Specified by:
      loadFromXML in interface IXMLConfigurable
    • saveToXML

      public void saveToXML(XML xml)
      Specified by:
      saveToXML in interface IXMLConfigurable
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object