Class AbstractHttpFetcher
- java.lang.Object
-
- com.norconex.collector.http.fetch.AbstractHttpFetcher
-
- All Implemented Interfaces:
IHttpFetcher
,IEventListener<Event>
,IXMLConfigurable
,EventListener
,Consumer<Event>
- Direct Known Subclasses:
GenericHttpFetcher
,PhantomJSDocumentFetcher
,WebDriverHttpFetcher
public abstract class AbstractHttpFetcher extends Object implements IHttpFetcher, IXMLConfigurable, IEventListener<Event>
Base class implementing the
accept(Doc, HttpMethod)
method using reference filters to determine if this fetcher will accept to fetch a URL and delegating the HTTP method check to its ownaccept(HttpMethod)
abstract method. It also offers methods to overwrite in order to react to crawler startup and shutdown events.XML configuration usage:
Subclasses inherit thisIXMLConfigurable
configuration:XML configuration usage:
<referenceFilters> <!-- multiple "filter" tags allowed --> <filter class="(any reference filter class)"> (Restrict usage of this fetcher to matching reference filters. Refer to the documentation for the IReferenceFilter implementation you are using here for usage details.) </filter> </referenceFilters>
Usage example:
This filter example will restrict applying an HTTP Fetcher to URLs ending with ".pdf".
XML usage example:
<referenceFilters> <filter class="ReferenceFilter" onMatch="exclude"> <valueMatcher method="regex"> https://example\.com/pdfs/.* </valueMatcher> </filter> </referenceFilters>
- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractHttpFetcher()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract boolean
accept(HttpMethod httpMethod)
Whether the supplied HttpMethod is supported by this fetcher.void
accept(Event event)
boolean
accept(Doc doc, HttpMethod httpMethod)
boolean
equals(Object other)
protected void
fetcherShutdown(HttpCollector collector)
Invoked once per fetcher when the collector ends.protected void
fetcherStartup(HttpCollector collector)
Invoked once per fetcher instance, when the collector starts.protected void
fetcherThreadBegin(HttpCrawler crawler)
Invoked each time a crawler begins a new crawler thread if that thread is the current thread.protected void
fetcherThreadEnd(HttpCrawler crawler)
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread.List<IReferenceFilter>
getReferenceFilters()
Gets reference filtersint
hashCode()
void
loadFromXML(XML xml)
protected abstract void
loadHttpFetcherFromXML(XML xml)
protected abstract void
saveHttpFetcherToXML(XML xml)
void
saveToXML(XML xml)
void
setReferenceFilters(IReferenceFilter... referenceFilters)
Sets reference filters.void
setReferenceFilters(List<IReferenceFilter> referenceFilters)
Sets reference filters.String
toString()
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.collector.http.fetch.IHttpFetcher
fetch, getUserAgent
-
-
-
-
Method Detail
-
getReferenceFilters
public List<IReferenceFilter> getReferenceFilters()
Gets reference filters- Returns:
- reference filters
-
setReferenceFilters
public void setReferenceFilters(IReferenceFilter... referenceFilters)
Sets reference filters.- Parameters:
referenceFilters
- reference filters to set
-
setReferenceFilters
public void setReferenceFilters(List<IReferenceFilter> referenceFilters)
Sets reference filters.- Parameters:
referenceFilters
- the referenceFilters to set
-
accept
public boolean accept(Doc doc, HttpMethod httpMethod)
- Specified by:
accept
in interfaceIHttpFetcher
-
accept
protected abstract boolean accept(HttpMethod httpMethod)
Whether the supplied HttpMethod is supported by this fetcher.- Parameters:
httpMethod
- the HTTP method- Returns:
true
if supported
-
fetcherStartup
protected void fetcherStartup(HttpCollector collector)
Invoked once per fetcher instance, when the collector starts. Default implementation does nothing.- Parameters:
collector
- collector
-
fetcherShutdown
protected void fetcherShutdown(HttpCollector collector)
Invoked once per fetcher when the collector ends. Default implementation does nothing.- Parameters:
collector
- collector
-
fetcherThreadBegin
protected void fetcherThreadBegin(HttpCrawler crawler)
Invoked each time a crawler begins a new crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler
- crawler
-
fetcherThreadEnd
protected void fetcherThreadEnd(HttpCrawler crawler)
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler
- crawler
-
loadFromXML
public final void loadFromXML(XML xml)
- Specified by:
loadFromXML
in interfaceIXMLConfigurable
-
saveToXML
public final void saveToXML(XML xml)
- Specified by:
saveToXML
in interfaceIXMLConfigurable
-
loadHttpFetcherFromXML
protected abstract void loadHttpFetcherFromXML(XML xml)
-
saveHttpFetcherToXML
protected abstract void saveHttpFetcherToXML(XML xml)
-
-