public abstract class AbstractHttpFetcher extends Object implements IHttpFetcher, IXMLConfigurable, IEventListener<Event>
Base class implementing the accept(Doc, HttpMethod)
method
using reference filters to determine if this fetcher will accept to fetch
a URL and delegating the HTTP method check to its own
accept(HttpMethod)
abstract method.
It also offers methods to overwrite in order to react to crawler
startup and shutdown events.
IXMLConfigurable
configuration:
<referenceFilters>
<!-- multiple "filter" tags allowed -->
<filter
class="(any reference filter class)">
(Restrict usage of this fetcher to matching reference filters.
Refer to the documentation for the IReferenceFilter implementation
you are using here for usage details.)
</filter>
</referenceFilters>
This filter example will restrict applying an HTTP Fetcher to URLs ending with ".pdf".
<referenceFilters>
<filter
class="ReferenceFilter"
onMatch="exclude">
<valueMatcher
method="regex">
https://example\.com/pdfs/.*
</valueMatcher>
</filter>
</referenceFilters>
Constructor and Description |
---|
AbstractHttpFetcher() |
Modifier and Type | Method and Description |
---|---|
boolean |
accept(Doc doc,
HttpMethod httpMethod) |
void |
accept(Event event) |
protected abstract boolean |
accept(HttpMethod httpMethod)
Whether the supplied HttpMethod is supported by this fetcher.
|
boolean |
equals(Object other) |
protected void |
fetcherShutdown(HttpCollector collector)
Invoked once per fetcher when the collector ends.
|
protected void |
fetcherStartup(HttpCollector collector)
Invoked once per fetcher instance, when the collector starts.
|
protected void |
fetcherThreadBegin(HttpCrawler crawler)
Invoked each time a crawler begins a new crawler thread if that thread
is the current thread.
|
protected void |
fetcherThreadEnd(HttpCrawler crawler)
Invoked each time a crawler ends an existing crawler thread if that
thread is the current thread.
|
List<IReferenceFilter> |
getReferenceFilters()
Gets reference filters
|
int |
hashCode() |
void |
loadFromXML(XML xml) |
protected abstract void |
loadHttpFetcherFromXML(XML xml) |
protected abstract void |
saveHttpFetcherToXML(XML xml) |
void |
saveToXML(XML xml) |
void |
setReferenceFilters(IReferenceFilter... referenceFilters)
Sets reference filters.
|
void |
setReferenceFilters(List<IReferenceFilter> referenceFilters)
Sets reference filters.
|
String |
toString() |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
fetch, getUserAgent
public List<IReferenceFilter> getReferenceFilters()
public void setReferenceFilters(IReferenceFilter... referenceFilters)
referenceFilters
- reference filters to setpublic void setReferenceFilters(List<IReferenceFilter> referenceFilters)
referenceFilters
- the referenceFilters to setpublic boolean accept(Doc doc, HttpMethod httpMethod)
accept
in interface IHttpFetcher
protected abstract boolean accept(HttpMethod httpMethod)
httpMethod
- the HTTP methodtrue
if supportedprotected void fetcherStartup(HttpCollector collector)
collector
- collectorprotected void fetcherShutdown(HttpCollector collector)
collector
- collectorprotected void fetcherThreadBegin(HttpCrawler crawler)
crawler
- crawlerprotected void fetcherThreadEnd(HttpCrawler crawler)
crawler
- crawlerpublic final void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
public final void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
protected abstract void loadHttpFetcherFromXML(XML xml)
protected abstract void saveHttpFetcherToXML(XML xml)
Copyright © 2009–2023 Norconex Inc.. All rights reserved.