public class GenericMetadataFetcher extends Object implements IHttpMetadataFetcher, IXMLConfigurable
Basic implementation of IHttpMetadataFetcher
.
<metadataFetcher class="com.norconex.collector.http.fetch.impl.GenericMetadataFetcher" skipOnBadStatus="[false|true]" > <validStatusCodes>(defaults to 200)</validStatusCodes> <notFoundStatusCodes>(defaults to 404)</notFoundStatusCodes> <headersPrefix>(string to prefix headers)</headersPrefix> </metadataFetcher>
The "validStatusCodes" and "notFoundStatusCodes" elements expect a coma-separated list of HTTP response code. If a code is added in both elements, the valid list takes precedence.
The "notFoundStatusCodes" element was added in 2.6.0.
The following configures a crawler to use this fetcher with the default settings.
<metadataFetcher class="com.norconex.collector.http.fetch.impl.GenericMetadataFetcher" />
Constructor and Description |
---|
GenericMetadataFetcher() |
GenericMetadataFetcher(int[] validStatusCodes) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.http.client.methods.HttpRequestBase |
createUriRequest(String url)
Creates the HTTP request to be executed.
|
boolean |
equals(Object other) |
HttpFetchResponse |
fetchHTTPHeaders(org.apache.http.client.HttpClient httpClient,
String url,
Properties metadata)
Fetches the HTTP headers for a URL and stores it in the
provided
Properties . |
String |
getHeadersPrefix() |
int[] |
getNotFoundStatusCodes()
Gets HTTP status codes to be considered as "Not found" state.
|
int[] |
getValidStatusCodes() |
int |
hashCode() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setHeadersPrefix(String headersPrefix) |
void |
setNotFoundStatusCodes(int... notFoundStatusCodes)
Sets HTTP status codes to be considered as "Not found" state.
|
void |
setValidStatusCodes(int... validStatusCodes) |
String |
toString() |
public GenericMetadataFetcher()
public GenericMetadataFetcher(int[] validStatusCodes)
public int[] getValidStatusCodes()
public void setValidStatusCodes(int... validStatusCodes)
public int[] getNotFoundStatusCodes()
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
notFoundStatusCodes
- "Not found" codespublic String getHeadersPrefix()
public void setHeadersPrefix(String headersPrefix)
public HttpFetchResponse fetchHTTPHeaders(org.apache.http.client.HttpClient httpClient, String url, Properties metadata)
IHttpMetadataFetcher
Properties
.fetchHTTPHeaders
in interface IHttpMetadataFetcher
httpClient
- the HTTP Clienturl
- the url from which to fetch the headersmetadata
- recipient for storing HTTP headers as metadataprotected org.apache.http.client.methods.HttpRequestBase createUriRequest(String url)
HttpHead
request around the provided URL.
This method can be overwritten to return another type of request.url
- the URL to create the request forpublic void loadFromXML(Reader in)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2009–2021 Norconex Inc.. All rights reserved.