Class AbstractLinkExtractor

java.lang.Object
com.norconex.collector.http.link.AbstractLinkExtractor
All Implemented Interfaces:
ILinkExtractor, IXMLConfigurable
Direct Known Subclasses:
AbstractTextLinkExtractor, TikaLinkExtractor

public abstract class AbstractLinkExtractor extends Object implements ILinkExtractor, IXMLConfigurable

Base class for link extraction providing common configuration settings.

Subclasses inherit the following:

XML configuration usage:


XML usage example:


The above example will apply to any content type starting with "text/".

Since:
3.0.0
Author:
Pascal Essiembre
  • Constructor Details

    • AbstractLinkExtractor

      public AbstractLinkExtractor()
  • Method Details

    • extractLinks

      public final Set<Link> extractLinks(CrawlDoc doc) throws IOException
      Specified by:
      extractLinks in interface ILinkExtractor
      Throws:
      IOException
    • extractLinks

      public abstract void extractLinks(Set<Link> links, CrawlDoc doc) throws IOException
      Throws:
      IOException
    • addRestriction

      public void addRestriction(PropertyMatcher... restrictions)
      Adds one or more restrictions this extractor should be restricted to.
      Parameters:
      restrictions - the restrictions
    • addRestrictions

      public void addRestrictions(List<PropertyMatcher> restrictions)
      Adds restrictions this extractor should be restricted to.
      Parameters:
      restrictions - the restrictions
    • setRestrictions

      public void setRestrictions(List<PropertyMatcher> restrictions)
      Sets restrictions this extractor should be restricted to.
      Parameters:
      restrictions - the restrictions
    • removeRestriction

      public int removeRestriction(String field)
      Removes all restrictions on a given field.
      Parameters:
      field - the field to remove restrictions on
      Returns:
      how many elements were removed
    • removeRestriction

      public boolean removeRestriction(PropertyMatcher restriction)
      Removes a restriction.
      Parameters:
      restriction - the restriction to remove
      Returns:
      true if this extractor contained the restriction
    • clearRestrictions

      public void clearRestrictions()
      Clears all restrictions.
    • getRestrictions

      public PropertyMatchers getRestrictions()
      Gets all restrictions
      Returns:
      the restrictions
    • loadFromXML

      public final void loadFromXML(XML xml)
      Specified by:
      loadFromXML in interface IXMLConfigurable
    • loadLinkExtractorFromXML

      protected abstract void loadLinkExtractorFromXML(XML xml)
      Loads configuration settings specific to the implementing class.
      Parameters:
      xml - XML configuration
    • saveToXML

      public final void saveToXML(XML xml)
      Specified by:
      saveToXML in interface IXMLConfigurable
    • saveLinkExtractorToXML

      protected abstract void saveLinkExtractorToXML(XML xml)
      Saves configuration settings specific to the implementing class.
      Parameters:
      xml - the XML
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object