public class GenericRedirectURLProvider extends Object implements IRedirectURLProvider, IXMLConfigurable
Provide redirect URLs by grabbing them from the HTTP Response
Location
header value. The URL is made absolute and
an attempt is made to fix possible character encoding issues.
The RFC 2616
specification mentions that the Location
header
should contain a URI as defined by the
RFC 1630 specification.
The later requires that a URI be 7-bit ASCII with any special characters
URL encoded.
Some redirect URLs do not conform to that so we apply the following logic
in an attempt to fix them:
Content-Type
header?
<redirectURLProvider class="com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider" fallbackCharset="(character encoding)" />
The following sets the default character encoding to be "ISO-8859-1" when it could not be detected.
<redirectURLProvider fallbackCharset="ISO-8859-1" />
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_FALLBACK_CHARSET |
Constructor and Description |
---|
GenericRedirectURLProvider() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
String |
getFallbackCharset() |
int |
hashCode() |
void |
loadFromXML(Reader in) |
String |
provideRedirectURL(org.apache.http.HttpRequest request,
org.apache.http.HttpResponse response,
org.apache.http.protocol.HttpContext context)
Provides the redirect URL that the crawler must follow.
|
void |
saveToXML(Writer out) |
void |
setFallbackCharset(String fallbackCharset) |
String |
toString() |
public static final String DEFAULT_FALLBACK_CHARSET
public String getFallbackCharset()
public void setFallbackCharset(String fallbackCharset)
public String provideRedirectURL(org.apache.http.HttpRequest request, org.apache.http.HttpResponse response, org.apache.http.protocol.HttpContext context)
IRedirectURLProvider
null
. Returning null
effectively
prevents a redirect from happening, but it is an efficient way to
disable redirects. The recommended approach to disable redirects is to
set zero on GenericHttpClientFactory.setMaxRedirects(int)
provideRedirectURL
in interface IRedirectURLProvider
request
- the HTTP request that led to the redirectresponse
- original URL HTTP responsecontext
- execution state of an HTTP processpublic void loadFromXML(Reader in)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2009–2021 Norconex Inc.. All rights reserved.