Class AbstractHttpFetcher
java.lang.Object
com.norconex.collector.http.fetch.AbstractHttpFetcher
- All Implemented Interfaces:
IHttpFetcher,IEventListener<Event>,IXMLConfigurable,EventListener,Consumer<Event>
- Direct Known Subclasses:
GenericHttpFetcher,PhantomJSDocumentFetcher,WebDriverHttpFetcher
public abstract class AbstractHttpFetcher
extends Object
implements IHttpFetcher, IXMLConfigurable, IEventListener<Event>
Base class implementing the accept(Doc, HttpMethod) method
using reference filters to determine if this fetcher will accept to fetch
a URL and delegating the HTTP method check to its own
accept(HttpMethod) abstract method.
It also offers methods to overwrite in order to react to crawler
startup and shutdown events.
XML configuration usage:
Subclasses inherit thisIXMLConfigurable configuration:
XML configuration usage:
<referenceFilters>
<!-- multiple "filter" tags allowed -->
<filter
class="(any reference filter class)">
(Restrict usage of this fetcher to matching reference filters.
Refer to the documentation for the IReferenceFilter implementation
you are using here for usage details.)
</filter>
</referenceFilters>
Usage example:
This filter example will restrict applying an HTTP Fetcher to URLs ending with ".pdf".
XML usage example:
<referenceFilters>
<filter
class="ReferenceFilter"
onMatch="exclude">
<valueMatcher
method="regex">
https://example\.com/pdfs/.*
</valueMatcher>
</filter>
</referenceFilters>- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract booleanaccept(HttpMethod httpMethod) Whether the supplied HttpMethod is supported by this fetcher.final voidbooleanaccept(Doc doc, HttpMethod httpMethod) booleanprotected voidfetcherShutdown(HttpCollector collector) Invoked once per fetcher when the collector ends.protected voidfetcherStartup(HttpCollector collector) Invoked once per fetcher instance, when the collector starts.protected voidfetcherThreadBegin(HttpCrawler crawler) Invoked each time a crawler begins a new crawler thread if that thread is the current thread.protected voidfetcherThreadEnd(HttpCrawler crawler) Invoked each time a crawler ends an existing crawler thread if that thread is the current thread.Gets reference filtersinthashCode()final voidloadFromXML(XML xml) protected abstract voidprotected abstract voidsaveHttpFetcherToXML(XML xml) final voidvoidsetReferenceFilters(IReferenceFilter... referenceFilters) Sets reference filters.voidsetReferenceFilters(List<IReferenceFilter> referenceFilters) Sets reference filters.toString()Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface com.norconex.collector.http.fetch.IHttpFetcher
fetch, getUserAgent
-
Constructor Details
-
AbstractHttpFetcher
public AbstractHttpFetcher()
-
-
Method Details
-
getReferenceFilters
Gets reference filters- Returns:
- reference filters
-
setReferenceFilters
Sets reference filters.- Parameters:
referenceFilters- reference filters to set
-
setReferenceFilters
Sets reference filters.- Parameters:
referenceFilters- the referenceFilters to set
-
accept
- Specified by:
acceptin interfaceIHttpFetcher
-
accept
Whether the supplied HttpMethod is supported by this fetcher.- Parameters:
httpMethod- the HTTP method- Returns:
trueif supported
-
accept
-
fetcherStartup
Invoked once per fetcher instance, when the collector starts. Default implementation does nothing.- Parameters:
collector- collector
-
fetcherShutdown
Invoked once per fetcher when the collector ends. Default implementation does nothing.- Parameters:
collector- collector
-
fetcherThreadBegin
Invoked each time a crawler begins a new crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler- crawler
-
fetcherThreadEnd
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread. Default implementation does nothing.- Parameters:
crawler- crawler
-
loadFromXML
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
saveToXML
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
loadHttpFetcherFromXML
-
saveHttpFetcherToXML
-
equals
-
hashCode
public int hashCode() -
toString
-