public class GenericHttpClientFactory extends Object implements IHttpClientFactory, IXMLConfigurable
Default implementation of IHttpClientFactory
.
As of 2.4.0, proxyPassword
and authPassword
can take a password that has been encrypted using EncryptionUtil
.
In order for the password to be decrypted properly by the crawler, you need
to specify the encryption key used to encrypt it. The key can be stored
in a few supported locations and a combination of
[auth|proxy]PasswordKey
and [auth|proxy]PasswordKeySource
must be specified to properly
locate the key. The supported sources are:
[...]PasswordKeySource |
[...]PasswordKey |
---|---|
key |
The actual encryption key. |
file |
Path to a file containing the encryption key. |
environment |
Name of an environment variable containing the key. |
property |
Name of a JVM system property containing the key. |
As of 2.7.0, XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
<httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory"> <cookiesDisabled>[false|true]</cookiesDisabled> <connectionTimeout>(milliseconds)</connectionTimeout> <socketTimeout>(milliseconds)</socketTimeout> <connectionRequestTimeout>(milliseconds)</connectionRequestTimeout> <connectionCharset>...</connectionCharset> <expectContinueEnabled>[false|true]</expectContinueEnabled> <maxRedirects>...</maxRedirects> <localAddress>...</localAddress> <maxConnections>...</maxConnections> <maxConnectionsPerRoute>...</maxConnectionsPerRoute> <maxConnectionIdleTime>(milliseconds)</maxConnectionIdleTime> <maxConnectionInactiveTime>(milliseconds)</maxConnectionInactiveTime> <!-- Be warned: trusting all certificates is usually a bad idea. --> <trustAllSSLCertificates>[false|true]</trustAllSSLCertificates> <!-- Since 2.6.2, you can specify SSL/TLS protocols to use --> <sslProtocols>(coma-separated list)</sslProtocols> <proxyHost>...</proxyHost> <proxyPort>...</proxyPort> <proxyRealm>...</proxyRealm> <proxyScheme>...</proxyScheme> <proxyUsername>...</proxyUsername> <proxyPassword>...</proxyPassword> <!-- Use the following if password is encrypted. --> <proxyPasswordKey>(the encryption key or a reference to it)</proxyPasswordKey> <proxyPasswordKeySource>[key|file|environment|property]</proxyPasswordKeySource> <!-- HTTP request headers passed on every HTTP requests --> <headers> <header name="(header name)">(header value)</header> <!-- You can repeat this header tag as needed. --> </headers> <authMethod>[form|basic|digest|ntlm|spnego|kerberos]</authMethod> <!-- These apply to any authentication mechanism --> <authUsername>...</authUsername> <authPassword>...</authPassword> <!-- Use the following if password is encrypted. --> <authPasswordKey>(the encryption key or a reference to it)</authPasswordKey> <authPasswordKeySource>[key|file|environment|property]</authPasswordKeySource> <!-- These apply to FORM authentication --> <authUsernameField>...</authUsernameField> <authPasswordField>...</authPasswordField> <authURL>...</authURL> <authFormCharset>...</authFormCharset> <!-- Extra form parameters required to authenticate (since 2.8.0) --> <authFormParams> <param name="(param name)">(param value)</param> <!-- You can repeat this param tag as needed. --> </authFormParams> <!-- These apply to both BASIC and DIGEST authentication --> <authHostname>...</authHostname> <authPort>...</authPort> <authRealm>...</authRealm> <!-- This applies to BASIC authentication --> <authPreemptive>[false|true]</authPreemptive> <!-- These apply to NTLM authentication --> <authHostname>...</authHostname> <authPort>...</authPort> <authWorkstation>...</authWorkstation> <authDomain>...</authDomain> </httpClientFactory>
The following will authenticate the crawler to a web site before crawling. The website uses an HTML form with a username and password fields called "loginUser" and "loginPwd".
<httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory"> <authUsername>joeUser</authUsername> <authPassword>joePasword</authPassword> <authUsernameField>loginUser</authUsernameField> <authPasswordField>loginPwd</authPasswordField> <authURL>http://www.example.com/login</authURL> </httpClientFactory>
Modifier and Type | Field and Description |
---|---|
static String |
AUTH_METHOD_BASIC
BASIC authentication method.
|
static String |
AUTH_METHOD_DIGEST
DIGEST authentication method.
|
static String |
AUTH_METHOD_FORM
Form-based authentication method.
|
static String |
AUTH_METHOD_KERBEROS
Experimental: Kerberos authentication method.
|
static String |
AUTH_METHOD_NTLM
NTLM authentication method.
|
static String |
AUTH_METHOD_SPNEGO
Experimental: SPNEGO authentication method.
|
static int |
DEFAULT_MAX_CONNECTIONS |
static int |
DEFAULT_MAX_CONNECTIONS_PER_ROUTE |
static int |
DEFAULT_MAX_IDLE_TIME |
static int |
DEFAULT_MAX_REDIRECT |
static int |
DEFAULT_TIMEOUT |
Constructor and Description |
---|
GenericHttpClientFactory() |
Modifier and Type | Method and Description |
---|---|
protected void |
authenticateUsingForm(org.apache.http.client.HttpClient httpClient) |
protected void |
buildCustomHttpClient(org.apache.http.impl.client.HttpClientBuilder builder)
For implementors to subclass.
|
protected org.apache.http.config.ConnectionConfig |
createConnectionConfig() |
protected org.apache.http.client.CredentialsProvider |
createCredentialsProvider() |
protected org.apache.http.client.CookieStore |
createDefaultCookieStore()
Creates the default cookie store to be added to each request context.
|
protected List<org.apache.http.Header> |
createDefaultRequestHeaders()
Creates a list of HTTP headers previously set by
setRequestHeader(String, String) . |
org.apache.http.client.HttpClient |
createHTTPClient(String userAgent)
Initializes the HTTP Client used for crawling.
|
protected org.apache.http.HttpHost |
createProxy() |
protected org.apache.http.client.RedirectStrategy |
createRedirectStrategy() |
protected org.apache.http.client.config.RequestConfig |
createRequestConfig() |
protected org.apache.http.conn.SchemePortResolver |
createSchemePortResolver() |
protected SSLContext |
createSSLContext() |
protected org.apache.http.conn.socket.LayeredConnectionSocketFactory |
createSSLSocketFactory(SSLContext sslContext) |
boolean |
equals(Object obj) |
String |
getAuthDomain()
Gets the NTLM authentication domain.
|
String |
getAuthFormCharset()
Gets the authentication form character set.
|
String |
getAuthFormParam(String name)
Gets an authentication form parameter (equivalent to "input" or other
fields in HTML forms).
|
String[] |
getAuthFormParamNames()
Gets all authentication form parameter names.
|
String |
getAuthHostname()
Gets the host name for the current authentication scope.
|
String |
getAuthMethod()
Gets the authentication method.
|
String |
getAuthPassword()
Gets the authentication password.
|
String |
getAuthPasswordField()
Gets the name of the HTML field where the password is set.
|
EncryptionKey |
getAuthPasswordKey()
Gets the authentication password encryption key.
|
int |
getAuthPort()
Gets the port for the current authentication scope.
|
String |
getAuthRealm()
Gets the realm name for the current authentication scope.
|
String |
getAuthURL()
Gets the URL for "form" authentication.
|
String |
getAuthUsername()
Gets the username.
|
String |
getAuthUsernameField()
Gets the name of the HTML field where the username is set.
|
String |
getAuthWorkstation()
Gets the NTLM authentication workstation name.
|
String |
getConnectionCharset()
Gets the connection character set.
|
int |
getConnectionRequestTimeout()
Gets the timeout when requesting a connection, in milliseconds
|
int |
getConnectionTimeout()
Gets the connection timeout until a connection is established,
in milliseconds.
|
String |
getCookieSpec() |
String |
getLocalAddress()
Gets the local address (IP or hostname).
|
int |
getMaxConnectionIdleTime()
Gets the period of time in milliseconds after which to evict idle
connections from the connection pool.
|
int |
getMaxConnectionInactiveTime()
Gets the period of time in milliseconds a connection must be inactive
to be checked in case it became stalled.
|
int |
getMaxConnections()
Gets the maximum number of connections that can be created.
|
int |
getMaxConnectionsPerRoute()
Gets the maximum number of connections that can be used per route.
|
int |
getMaxRedirects()
Gets the maximum number of redirects to be followed.
|
String |
getProxyHost()
Gets the proxy host.
|
String |
getProxyPassword()
Gets the proxy password.
|
EncryptionKey |
getProxyPasswordKey()
Gets the proxy password encryption key.
|
int |
getProxyPort()
Gets the proxy port.
|
String |
getProxyRealm()
Gets the proxy realm.
|
String |
getProxyScheme()
Gets the proxy scheme.
|
String |
getProxyUsername()
Gets the proxy username.
|
String |
getRequestHeader(String name)
Gets the HTTP request header value matching the given name, previously
set with
setRequestHeader(String, String) . |
String[] |
getRequestHeaderNames()
Gets all HTTP request header names for headers previously set
with
setRequestHeader(String, String) . |
String[] |
getRequestHeaders()
Deprecated.
Since 2.8.0 use
getRequestHeaderNames() |
int |
getSocketTimeout()
Gets the maximum period of inactivity between two consecutive data
packets, in milliseconds.
|
String[] |
getSSLProtocols()
Gets the supported SSL/TLS protocols.
|
int |
hashCode() |
boolean |
isAuthPreemptive()
Gets whether to perform preemptive authentication
(valid for "basic" authentication method).
|
boolean |
isCookiesDisabled()
Whether cookie support is disabled.
|
boolean |
isExpectContinueEnabled()
Whether 'Expect: 100-continue' handshake is enabled.
|
boolean |
isStaleConnectionCheckDisabled()
Deprecated.
Since 2.1.0.
As of 2.2.0, use
getMaxConnectionInactiveTime() instead. |
boolean |
isTrustAllSSLCertificates()
Whether to trust all SSL certificates (affects only "https" connections).
|
void |
loadFromXML(Reader in) |
String |
removeAuthFormParameter(String name)
Remove the authentication form parameter matching the given name.
|
String |
removeRequestHeader(String name)
Remove the request header matching the given name.
|
void |
saveToXML(Writer out) |
void |
setAuthDomain(String authDomain)
Sets the NTLM authentication domain
|
void |
setAuthFormCharset(String authFormCharset)
Sets the authentication form character set for the form field values.
|
void |
setAuthFormParam(String name,
String value)
Sets an authentication form parameter (equivalent to "input" or other
fields in HTML forms).
|
void |
setAuthHostname(String authHostname)
Sets the host name for the current authentication scope.
|
void |
setAuthMethod(String authMethod)
Sets the authentication method.
|
void |
setAuthPassword(String authPassword)
Sets the authentication password.
|
void |
setAuthPasswordField(String authPasswordField)
Sets the name of the HTML field where the password is set.
|
void |
setAuthPasswordKey(EncryptionKey authPasswordKey)
Sets the authentication password encryption key.
|
void |
setAuthPort(int authPort)
Sets the port for the current authentication scope.
|
void |
setAuthPreemptive(boolean authPreemptive)
Sets whether to perform preemptive authentication
(valid for "basic" authentication method).
|
void |
setAuthRealm(String authRealm)
Sets the realm name for the current authentication scope.
|
void |
setAuthURL(String authURL)
Sets the URL for "form" authentication.
|
void |
setAuthUsername(String authUsername)
Sets the username.
|
void |
setAuthUsernameField(String authUsernameField)
Sets the name of the HTML field where the username is set.
|
void |
setAuthWorkstation(String authWorkstation)
Sets the NTLM authentication workstation name.
|
void |
setConnectionCharset(String connectionCharset)
Sets the connection character set.
|
void |
setConnectionRequestTimeout(int connectionRequestTimeout)
Sets the timeout when requesting a connection, in milliseconds.
|
void |
setConnectionTimeout(int connectionTimeout)
Sets the connection timeout until a connection is established,
in milliseconds.
|
void |
setCookiesDisabled(boolean cookiesDisabled)
Sets whether cookie support is disabled.
|
void |
setCookieSpec(String cookieSpec) |
void |
setExpectContinueEnabled(boolean expectContinueEnabled)
Sets whether 'Expect: 100-continue' handshake is enabled.
|
void |
setLocalAddress(String localAddress)
Sets the local address, which may be useful when working with multiple
network interfaces.
|
void |
setMaxConnectionIdleTime(int maxConnectionIdleTime)
Sets the period of time in milliseconds after which to evict idle
connections from the connection pool.
|
void |
setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
Sets the period of time in milliseconds a connection must be inactive
to be checked in case it became stalled.
|
void |
setMaxConnections(int maxConnections)
Sets maximum number of connections that can be created.
|
void |
setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
Sets the maximum number of connections that can be used per route.
|
void |
setMaxRedirects(int maxRedirects)
Sets the maximum number of redirects to be followed.
|
void |
setProxyHost(String proxyHost)
Sets the proxy host.
|
void |
setProxyPassword(String proxyPassword)
Sets the proxy password.
|
void |
setProxyPasswordKey(EncryptionKey proxyPasswordKey)
Sets the proxy password encryption key.
|
void |
setProxyPort(int proxyPort)
Sets the proxy port.
|
void |
setProxyRealm(String proxyRealm)
Sets the proxy realm
|
void |
setProxyScheme(String proxyScheme)
Sets the proxy scheme.
|
void |
setProxyUsername(String proxyUsername)
Sets the proxy username
|
void |
setRequestHeader(String name,
String value)
Sets a default HTTP request header every HTTP connection should have.
|
void |
setSocketTimeout(int socketTimeout)
Sets the maximum period of inactivity between two consecutive data
packets, in milliseconds.
|
void |
setSSLProtocols(String... sslProtocols)
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1,
and TLSv1.2.
|
void |
setStaleConnectionCheckDisabled(boolean staleConnectionCheckDisabled)
Deprecated.
Since 2.1.0.
As of 2.2.0, use
setMaxConnectionInactiveTime(int) instead. |
void |
setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
Sets whether to trust all SSL certificate.
|
String |
toString() |
public static final String AUTH_METHOD_FORM
public static final String AUTH_METHOD_BASIC
public static final String AUTH_METHOD_DIGEST
public static final String AUTH_METHOD_NTLM
public static final String AUTH_METHOD_SPNEGO
public static final String AUTH_METHOD_KERBEROS
public static final int DEFAULT_TIMEOUT
public static final int DEFAULT_MAX_REDIRECT
public static final int DEFAULT_MAX_CONNECTIONS
public static final int DEFAULT_MAX_CONNECTIONS_PER_ROUTE
public static final int DEFAULT_MAX_IDLE_TIME
public org.apache.http.client.HttpClient createHTTPClient(String userAgent)
IHttpClientFactory
createHTTPClient
in interface IHttpClientFactory
userAgent
- the HTTP request "User-Agent" header valueprotected void buildCustomHttpClient(org.apache.http.impl.client.HttpClientBuilder builder)
builder
- http client builderprotected void authenticateUsingForm(org.apache.http.client.HttpClient httpClient)
protected org.apache.http.client.CookieStore createDefaultCookieStore()
protected List<org.apache.http.Header> createDefaultRequestHeaders()
Creates a list of HTTP headers previously set by
setRequestHeader(String, String)
.
Since 2.8.0, this method will also add a "Basic" authentication
header if setAuthPreemptive(boolean)
is true
and
credentials were supplied.
protected org.apache.http.client.RedirectStrategy createRedirectStrategy()
protected org.apache.http.conn.SchemePortResolver createSchemePortResolver()
protected org.apache.http.client.config.RequestConfig createRequestConfig()
protected org.apache.http.HttpHost createProxy()
protected org.apache.http.client.CredentialsProvider createCredentialsProvider()
protected org.apache.http.config.ConnectionConfig createConnectionConfig()
protected org.apache.http.conn.socket.LayeredConnectionSocketFactory createSSLSocketFactory(SSLContext sslContext)
protected SSLContext createSSLContext()
public void loadFromXML(Reader in)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
public void setRequestHeader(String name, String value)
name
- HTTP request header namevalue
- HTTP request header valuepublic String getRequestHeader(String name)
setRequestHeader(String, String)
.name
- HTTP request header namenull
if
no match is found@Deprecated public String[] getRequestHeaders()
getRequestHeaderNames()
setRequestHeader(String, String)
. If no request headers
are set, it returns an empty array.public String[] getRequestHeaderNames()
setRequestHeader(String, String)
. If no request headers
are set, it returns an empty array.public String removeRequestHeader(String name)
name
- name of HTTP request header to removenull
if there was no request header for the name.public String getAuthMethod()
public void setAuthMethod(String authMethod)
authMethod
- authentication methodpublic String getAuthUsernameField()
public void setAuthUsernameField(String authUsernameField)
authUsernameField
- name of the HTML fieldpublic String getAuthUsername()
public void setAuthUsername(String authUsername)
authUsername
- usernamepublic String getAuthPasswordField()
public void setAuthPasswordField(String authPasswordField)
authPasswordField
- name of the HTML fieldpublic String getAuthPassword()
public void setAuthPassword(String authPassword)
authPassword
- passwordpublic EncryptionKey getAuthPasswordKey()
null
if the password is not
encrypted.EncryptionUtil
public void setAuthPasswordKey(EncryptionKey authPasswordKey)
authPasswordKey
- password keyEncryptionUtil
public boolean isCookiesDisabled()
true
if disabledpublic void setCookiesDisabled(boolean cookiesDisabled)
cookiesDisabled
- true
if disabledpublic String getCookieSpec()
CookieSpecs
public void setCookieSpec(String cookieSpec)
cookieSpec
- the cookieSpec to use as defined in CookieSpecs
public String getAuthURL()
public void setAuthURL(String authURL)
authURL
- "form" authentication URLpublic String getAuthHostname()
null
means any host names for the scope.
Used for BASIC and DIGEST authentication.public void setAuthHostname(String authHostname)
authHostname
- hostname for the scopepublic int getAuthPort()
public void setAuthPort(int authPort)
authPort
- port for the scopepublic String getAuthRealm()
null
indicates "any realm"
for the scope.
Used for BASIC and DIGEST authentication.public void setAuthRealm(String authRealm)
authRealm
- reaml name for the scopepublic String getAuthFormCharset()
public void setAuthFormCharset(String authFormCharset)
authFormCharset
- authentication form character setpublic boolean isTrustAllSSLCertificates()
true
if trusting all SSL certificatespublic void setTrustAllSSLCertificates(boolean trustAllSSLCertificates)
trustAllSSLCertificates
- true
if trusting all SSL
certificatespublic String getProxyHost()
public void setProxyHost(String proxyHost)
proxyHost
- proxy hostpublic int getProxyPort()
public void setProxyPort(int proxyPort)
proxyPort
- proxy portpublic String getProxyScheme()
public void setProxyScheme(String proxyScheme)
proxyScheme
- proxy schemepublic String getProxyUsername()
public void setProxyUsername(String proxyUsername)
proxyUsername
- proxy usernamepublic String getProxyPassword()
public void setProxyPassword(String proxyPassword)
proxyPassword
- proxy passwordpublic EncryptionKey getProxyPasswordKey()
null
if the password is not
encrypted.EncryptionUtil
public void setProxyPasswordKey(EncryptionKey proxyPasswordKey)
proxyPasswordKey
- password keyEncryptionUtil
public String getProxyRealm()
public void setProxyRealm(String proxyRealm)
proxyRealm
- proxy realmpublic int getConnectionTimeout()
public void setConnectionTimeout(int connectionTimeout)
DEFAULT_TIMEOUT
.connectionTimeout
- connection timeoutpublic int getSocketTimeout()
public void setSocketTimeout(int socketTimeout)
DEFAULT_TIMEOUT
.socketTimeout
- socket timeoutpublic int getConnectionRequestTimeout()
public void setConnectionRequestTimeout(int connectionRequestTimeout)
DEFAULT_TIMEOUT
.connectionRequestTimeout
- connection request timeoutpublic String getConnectionCharset()
public void setConnectionCharset(String connectionCharset)
connectionCharset
- connection character setpublic boolean isExpectContinueEnabled()
true
if enabledpublic void setExpectContinueEnabled(boolean expectContinueEnabled)
RequestConfig.isExpectContinueEnabled()
expectContinueEnabled
- true
if enabledpublic int getMaxRedirects()
public void setMaxRedirects(int maxRedirects)
DEFAULT_MAX_REDIRECT
.maxRedirects
- maximum number of redirects to be followedpublic String getLocalAddress()
public void setLocalAddress(String localAddress)
localAddress
- locale address@Deprecated public boolean isStaleConnectionCheckDisabled()
getMaxConnectionInactiveTime()
instead.true
if stale connection check is disabled@Deprecated public void setStaleConnectionCheckDisabled(boolean staleConnectionCheckDisabled)
setMaxConnectionInactiveTime(int)
instead.staleConnectionCheckDisabled
- true
if stale
connection check is disabledpublic String getAuthWorkstation()
public void setAuthWorkstation(String authWorkstation)
authWorkstation
- workstation namepublic String getAuthDomain()
public void setAuthDomain(String authDomain)
authDomain
- authentication domainpublic int getMaxConnections()
public void setMaxConnections(int maxConnections)
DEFAULT_MAX_CONNECTIONS
.maxConnections
- maximum number of connectionspublic int getMaxConnectionsPerRoute()
public void setMaxConnectionsPerRoute(int maxConnectionsPerRoute)
DEFAULT_MAX_CONNECTIONS_PER_ROUTE
.maxConnectionsPerRoute
- maximum number of connections per routepublic int getMaxConnectionIdleTime()
public void setMaxConnectionIdleTime(int maxConnectionIdleTime)
DEFAULT_MAX_IDLE_TIME
.maxConnectionIdleTime
- amount of time after which to evict idle
connectionspublic int getMaxConnectionInactiveTime()
public void setMaxConnectionInactiveTime(int maxConnectionInactiveTime)
maxConnectionInactiveTime
- period of time in millisecondspublic String[] getSSLProtocols()
null
,
which means it will use those provided/configured by your Java
platform.public void setSSLProtocols(String... sslProtocols)
sslProtocols
- SSL/TLS protocols supportedpublic void setAuthFormParam(String name, String value)
name
- form parameter namevalue
- form parameter valuepublic String getAuthFormParam(String name)
name
- form parameter namenull
if
no match is foundpublic String[] getAuthFormParamNames()
public String removeAuthFormParameter(String name)
name
- name of form parameter to removenull
if there was no form parameter for the name.public boolean isAuthPreemptive()
true
to perform preemptive authenticationpublic void setAuthPreemptive(boolean authPreemptive)
authPreemptive
- true
to perform preemptive authenticationCopyright © 2009–2021 Norconex Inc.. All rights reserved.