public class GenericMetadataFetcher extends Object implements IHttpMetadataFetcher, IXMLConfigurable
Basic implementation of IHttpMetadataFetcher.
<metadataFetcher
class="com.norconex.collector.http.fetch.impl.GenericMetadataFetcher"
skipOnBadStatus="[false|true]" >
<validStatusCodes>(defaults to 200)</validStatusCodes>
<notFoundStatusCodes>(defaults to 404)</notFoundStatusCodes>
<headersPrefix>(string to prefix headers)</headersPrefix>
</metadataFetcher>
The "validStatusCodes" and "notFoundStatusCodes" elements expect a coma-separated list of HTTP response code. If a code is added in both elements, the valid list takes precedence.
The "notFoundStatusCodes" element was added in 2.6.0.
The following configures a crawler to use this fetcher with the default settings.
<metadataFetcher
class="com.norconex.collector.http.fetch.impl.GenericMetadataFetcher" />
| Constructor and Description |
|---|
GenericMetadataFetcher() |
GenericMetadataFetcher(int[] validStatusCodes) |
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.http.client.methods.HttpRequestBase |
createUriRequest(String url)
Creates the HTTP request to be executed.
|
boolean |
equals(Object other) |
HttpFetchResponse |
fetchHTTPHeaders(org.apache.http.client.HttpClient httpClient,
String url,
Properties metadata)
Fetches the HTTP headers for a URL and stores it in the
provided
Properties. |
String |
getHeadersPrefix() |
int[] |
getNotFoundStatusCodes()
Gets HTTP status codes to be considered as "Not found" state.
|
int[] |
getValidStatusCodes() |
int |
hashCode() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setHeadersPrefix(String headersPrefix) |
void |
setNotFoundStatusCodes(int... notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.
|
void |
setValidStatusCodes(int... validStatusCodes) |
String |
toString() |
public GenericMetadataFetcher()
public GenericMetadataFetcher(int[] validStatusCodes)
public int[] getValidStatusCodes()
public void setValidStatusCodes(int... validStatusCodes)
public int[] getNotFoundStatusCodes()
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
notFoundStatusCodes - "Not found" codespublic String getHeadersPrefix()
public void setHeadersPrefix(String headersPrefix)
public HttpFetchResponse fetchHTTPHeaders(org.apache.http.client.HttpClient httpClient, String url, Properties metadata)
IHttpMetadataFetcherProperties.fetchHTTPHeaders in interface IHttpMetadataFetcherhttpClient - the HTTP Clienturl - the url from which to fetch the headersmetadata - recipient for storing HTTP headers as metadataprotected org.apache.http.client.methods.HttpRequestBase createUriRequest(String url)
HttpHead request around the provided URL.
This method can be overwritten to return another type of request.url - the URL to create the request forpublic void loadFromXML(Reader in)
loadFromXML in interface IXMLConfigurablepublic void saveToXML(Writer out) throws IOException
saveToXML in interface IXMLConfigurableIOExceptionCopyright © 2009–2021 Norconex Inc.. All rights reserved.