Package com.norconex.collector.http.link
Class AbstractLinkExtractor
- java.lang.Object
-
- com.norconex.collector.http.link.AbstractLinkExtractor
-
- All Implemented Interfaces:
ILinkExtractor
,IXMLConfigurable
- Direct Known Subclasses:
AbstractTextLinkExtractor
,TikaLinkExtractor
public abstract class AbstractLinkExtractor extends Object implements ILinkExtractor, IXMLConfigurable
Base class for link extraction providing common configuration settings.
Subclasses inherit the following:
XML configuration usage:
XML usage example:
The above example will apply to any content type starting with "text/".
- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractLinkExtractor()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this extractor should be restricted to.void
addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this extractor should be restricted to.void
clearRestrictions()
Clears all restrictions.boolean
equals(Object other)
Set<Link>
extractLinks(CrawlDoc doc)
abstract void
extractLinks(Set<Link> links, CrawlDoc doc)
PropertyMatchers
getRestrictions()
Gets all restrictionsint
hashCode()
void
loadFromXML(XML xml)
protected abstract void
loadLinkExtractorFromXML(XML xml)
Loads configuration settings specific to the implementing class.boolean
removeRestriction(PropertyMatcher restriction)
Removes a restriction.int
removeRestriction(String field)
Removes all restrictions on a given field.protected abstract void
saveLinkExtractorToXML(XML xml)
Saves configuration settings specific to the implementing class.void
saveToXML(XML xml)
void
setRestrictions(List<PropertyMatcher> restrictions)
Sets restrictions this extractor should be restricted to.String
toString()
-
-
-
Method Detail
-
extractLinks
public final Set<Link> extractLinks(CrawlDoc doc) throws IOException
- Specified by:
extractLinks
in interfaceILinkExtractor
- Throws:
IOException
-
extractLinks
public abstract void extractLinks(Set<Link> links, CrawlDoc doc) throws IOException
- Throws:
IOException
-
addRestriction
public void addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this extractor should be restricted to.- Parameters:
restrictions
- the restrictions
-
addRestrictions
public void addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this extractor should be restricted to.- Parameters:
restrictions
- the restrictions
-
setRestrictions
public void setRestrictions(List<PropertyMatcher> restrictions)
Sets restrictions this extractor should be restricted to.- Parameters:
restrictions
- the restrictions
-
removeRestriction
public int removeRestriction(String field)
Removes all restrictions on a given field.- Parameters:
field
- the field to remove restrictions on- Returns:
- how many elements were removed
-
removeRestriction
public boolean removeRestriction(PropertyMatcher restriction)
Removes a restriction.- Parameters:
restriction
- the restriction to remove- Returns:
true
if this extractor contained the restriction
-
clearRestrictions
public void clearRestrictions()
Clears all restrictions.
-
getRestrictions
public PropertyMatchers getRestrictions()
Gets all restrictions- Returns:
- the restrictions
-
loadFromXML
public final void loadFromXML(XML xml)
- Specified by:
loadFromXML
in interfaceIXMLConfigurable
-
loadLinkExtractorFromXML
protected abstract void loadLinkExtractorFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml
- XML configuration
-
saveToXML
public final void saveToXML(XML xml)
- Specified by:
saveToXML
in interfaceIXMLConfigurable
-
saveLinkExtractorToXML
protected abstract void saveLinkExtractorToXML(XML xml)
Saves configuration settings specific to the implementing class.- Parameters:
xml
- the XML
-
-