Class AbstractHttpFetcher
- java.lang.Object
-
- com.norconex.collector.http.fetch.AbstractHttpFetcher
-
- All Implemented Interfaces:
IHttpFetcher,IEventListener<Event>,IXMLConfigurable,EventListener,Consumer<Event>
- Direct Known Subclasses:
GenericHttpFetcher,PhantomJSDocumentFetcher,WebDriverHttpFetcher
public abstract class AbstractHttpFetcher extends Object implements IHttpFetcher, IXMLConfigurable, IEventListener<Event>
Base class implementing the
accept(Doc, HttpMethod)method using reference filters to determine if this fetcher will accept to fetch a URL and delegating the HTTP method check to its ownaccept(HttpMethod)abstract method. It also offers methods to overwrite in order to react to crawler startup and shutdown events.XML configuration usage:
Subclasses inherit thisIXMLConfigurableconfiguration:XML configuration usage:
<referenceFilters> <!-- multiple "filter" tags allowed --> <filter class="(any reference filter class)"> (Restrict usage of this fetcher to matching reference filters. Refer to the documentation for the IReferenceFilter implementation you are using here for usage details.) </filter> </referenceFilters>Usage example:
This filter example will restrict applying an HTTP Fetcher to URLs ending with ".pdf".
XML usage example:
<referenceFilters> <filter class="ReferenceFilter" onMatch="exclude"> <valueMatcher method="regex"> https://example\.com/pdfs/.* </valueMatcher> </filter> </referenceFilters>- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractHttpFetcher()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract booleanaccept(HttpMethod httpMethod)Whether the supplied HttpMethod is supported by this fetcher.voidaccept(Event event)booleanaccept(Doc doc, HttpMethod httpMethod)booleanequals(Object other)protected voidfetcherShutdown(HttpCollector collector)Invoked once per fetcher when the collector ends.protected voidfetcherStartup(HttpCollector collector)Invoked once per fetcher instance, when the collector starts.protected voidfetcherThreadBegin(HttpCrawler crawler)Invoked each time a crawler begins a new crawler thread if that thread is the current thread.protected voidfetcherThreadEnd(HttpCrawler crawler)Invoked each time a crawler ends an existing crawler thread if that thread is the current thread.List<IReferenceFilter>getReferenceFilters()Gets reference filtersinthashCode()voidloadFromXML(XML xml)protected abstract voidloadHttpFetcherFromXML(XML xml)protected abstract voidsaveHttpFetcherToXML(XML xml)voidsaveToXML(XML xml)voidsetReferenceFilters(IReferenceFilter... referenceFilters)Sets reference filters.voidsetReferenceFilters(List<IReferenceFilter> referenceFilters)Sets reference filters.StringtoString()-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.collector.http.fetch.IHttpFetcher
fetch, getUserAgent
-
-
-
-
Method Detail
-
getReferenceFilters
public List<IReferenceFilter> getReferenceFilters()
Gets reference filters- Returns:
- reference filters
-
setReferenceFilters
public void setReferenceFilters(IReferenceFilter... referenceFilters)
Sets reference filters.- Parameters:
referenceFilters- reference filters to set
-
setReferenceFilters
public void setReferenceFilters(List<IReferenceFilter> referenceFilters)
Sets reference filters.- Parameters:
referenceFilters- the referenceFilters to set
-
accept
public boolean accept(Doc doc, HttpMethod httpMethod)
- Specified by:
acceptin interfaceIHttpFetcher
-
accept
protected abstract boolean accept(HttpMethod httpMethod)
Whether the supplied HttpMethod is supported by this fetcher.- Parameters:
httpMethod- the HTTP method- Returns:
trueif supported
-
fetcherStartup
protected void fetcherStartup(HttpCollector collector)
Invoked once per fetcher instance, when the collector starts. Default implementation does nothing.- Parameters:
collector- collector
-
fetcherShutdown
protected void fetcherShutdown(HttpCollector collector)
Invoked once per fetcher when the collector ends. Default implementation does nothing.- Parameters:
collector- collector
-
fetcherThreadBegin
protected void fetcherThreadBegin(HttpCrawler crawler)
Invoked each time a crawler begins a new crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler- crawler
-
fetcherThreadEnd
protected void fetcherThreadEnd(HttpCrawler crawler)
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler- crawler
-
loadFromXML
public final void loadFromXML(XML xml)
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
saveToXML
public final void saveToXML(XML xml)
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
loadHttpFetcherFromXML
protected abstract void loadHttpFetcherFromXML(XML xml)
-
saveHttpFetcherToXML
protected abstract void saveHttpFetcherToXML(XML xml)
-
-