Class AbstractHttpFetcher

java.lang.Object
com.norconex.collector.http.fetch.AbstractHttpFetcher
All Implemented Interfaces:
IHttpFetcher, IEventListener<Event>, IXMLConfigurable, EventListener, Consumer<Event>
Direct Known Subclasses:
GenericHttpFetcher, PhantomJSDocumentFetcher, WebDriverHttpFetcher

public abstract class AbstractHttpFetcher extends Object implements IHttpFetcher, IXMLConfigurable, IEventListener<Event>

Base class implementing the accept(Doc, HttpMethod) method using reference filters to determine if this fetcher will accept to fetch a URL and delegating the HTTP method check to its own accept(HttpMethod) abstract method. It also offers methods to overwrite in order to react to crawler startup and shutdown events.

XML configuration usage:

Subclasses inherit this IXMLConfigurable configuration:

XML configuration usage:


<referenceFilters>
  <!-- multiple "filter" tags allowed -->
  <filter
      class="(any reference filter class)">
    (Restrict usage of this fetcher to matching reference filters.
     Refer to the documentation for the IReferenceFilter implementation
     you are using here for usage details.)
  </filter>
</referenceFilters>

Usage example:

This filter example will restrict applying an HTTP Fetcher to URLs ending with ".pdf".

XML usage example:


<referenceFilters>
  <filter
      class="ReferenceFilter"
      onMatch="exclude">
    <valueMatcher
        method="regex">
      https://example\.com/pdfs/.*
    </valueMatcher>
  </filter>
</referenceFilters>
Since:
3.0.0
Author:
Pascal Essiembre
  • Constructor Details

    • AbstractHttpFetcher

      public AbstractHttpFetcher()
  • Method Details

    • getReferenceFilters

      public List<IReferenceFilter> getReferenceFilters()
      Gets reference filters
      Returns:
      reference filters
    • setReferenceFilters

      public void setReferenceFilters(IReferenceFilter... referenceFilters)
      Sets reference filters.
      Parameters:
      referenceFilters - reference filters to set
    • setReferenceFilters

      public void setReferenceFilters(List<IReferenceFilter> referenceFilters)
      Sets reference filters.
      Parameters:
      referenceFilters - the referenceFilters to set
    • accept

      public boolean accept(Doc doc, HttpMethod httpMethod)
      Specified by:
      accept in interface IHttpFetcher
    • accept

      protected abstract boolean accept(HttpMethod httpMethod)
      Whether the supplied HttpMethod is supported by this fetcher.
      Parameters:
      httpMethod - the HTTP method
      Returns:
      true if supported
    • accept

      public final void accept(Event event)
      Specified by:
      accept in interface Consumer<Event>
    • fetcherStartup

      protected void fetcherStartup(HttpCollector collector)
      Invoked once per fetcher instance, when the collector starts. Default implementation does nothing.
      Parameters:
      collector - collector
    • fetcherShutdown

      protected void fetcherShutdown(HttpCollector collector)
      Invoked once per fetcher when the collector ends. Default implementation does nothing.
      Parameters:
      collector - collector
    • fetcherThreadBegin

      protected void fetcherThreadBegin(HttpCrawler crawler)
      Invoked each time a crawler begins a new crawler thread if that thread is the current thread. Default implementation does nothing.
      Parameters:
      crawler - crawler
    • fetcherThreadEnd

      protected void fetcherThreadEnd(HttpCrawler crawler)
      Invoked each time a crawler ends an existing crawler thread if that thread is the current thread. Default implementation does nothing.
      Parameters:
      crawler - crawler
    • loadFromXML

      public final void loadFromXML(XML xml)
      Specified by:
      loadFromXML in interface IXMLConfigurable
    • saveToXML

      public final void saveToXML(XML xml)
      Specified by:
      saveToXML in interface IXMLConfigurable
    • loadHttpFetcherFromXML

      protected abstract void loadHttpFetcherFromXML(XML xml)
    • saveHttpFetcherToXML

      protected abstract void saveHttpFetcherToXML(XML xml)
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object