Package com.norconex.collector.http.link
Class AbstractLinkExtractor
- java.lang.Object
-
- com.norconex.collector.http.link.AbstractLinkExtractor
-
- All Implemented Interfaces:
ILinkExtractor,IXMLConfigurable
- Direct Known Subclasses:
AbstractTextLinkExtractor,TikaLinkExtractor
public abstract class AbstractLinkExtractor extends Object implements ILinkExtractor, IXMLConfigurable
Base class for link extraction providing common configuration settings.
Subclasses inherit the following:
XML configuration usage:
XML usage example:
The above example will apply to any content type starting with "text/".
- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractLinkExtractor()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidaddRestriction(PropertyMatcher... restrictions)Adds one or more restrictions this extractor should be restricted to.voidaddRestrictions(List<PropertyMatcher> restrictions)Adds restrictions this extractor should be restricted to.voidclearRestrictions()Clears all restrictions.booleanequals(Object other)Set<Link>extractLinks(CrawlDoc doc)abstract voidextractLinks(Set<Link> links, CrawlDoc doc)PropertyMatchersgetRestrictions()Gets all restrictionsinthashCode()voidloadFromXML(XML xml)protected abstract voidloadLinkExtractorFromXML(XML xml)Loads configuration settings specific to the implementing class.booleanremoveRestriction(PropertyMatcher restriction)Removes a restriction.intremoveRestriction(String field)Removes all restrictions on a given field.protected abstract voidsaveLinkExtractorToXML(XML xml)Saves configuration settings specific to the implementing class.voidsaveToXML(XML xml)voidsetRestrictions(List<PropertyMatcher> restrictions)Sets restrictions this extractor should be restricted to.StringtoString()
-
-
-
Method Detail
-
extractLinks
public final Set<Link> extractLinks(CrawlDoc doc) throws IOException
- Specified by:
extractLinksin interfaceILinkExtractor- Throws:
IOException
-
extractLinks
public abstract void extractLinks(Set<Link> links, CrawlDoc doc) throws IOException
- Throws:
IOException
-
addRestriction
public void addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this extractor should be restricted to.- Parameters:
restrictions- the restrictions
-
addRestrictions
public void addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this extractor should be restricted to.- Parameters:
restrictions- the restrictions
-
setRestrictions
public void setRestrictions(List<PropertyMatcher> restrictions)
Sets restrictions this extractor should be restricted to.- Parameters:
restrictions- the restrictions
-
removeRestriction
public int removeRestriction(String field)
Removes all restrictions on a given field.- Parameters:
field- the field to remove restrictions on- Returns:
- how many elements were removed
-
removeRestriction
public boolean removeRestriction(PropertyMatcher restriction)
Removes a restriction.- Parameters:
restriction- the restriction to remove- Returns:
trueif this extractor contained the restriction
-
clearRestrictions
public void clearRestrictions()
Clears all restrictions.
-
getRestrictions
public PropertyMatchers getRestrictions()
Gets all restrictions- Returns:
- the restrictions
-
loadFromXML
public final void loadFromXML(XML xml)
- Specified by:
loadFromXMLin interfaceIXMLConfigurable
-
loadLinkExtractorFromXML
protected abstract void loadLinkExtractorFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml- XML configuration
-
saveToXML
public final void saveToXML(XML xml)
- Specified by:
saveToXMLin interfaceIXMLConfigurable
-
saveLinkExtractorToXML
protected abstract void saveLinkExtractorToXML(XML xml)
Saves configuration settings specific to the implementing class.- Parameters:
xml- the XML
-
-