public abstract class AbstractLinkExtractor extends Object implements ILinkExtractor, IXMLConfigurable
Base class for link extraction providing common configuration settings.
Subclasses inherit the following:
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<restrictTo>
<fieldMatcher>document.contentType</fieldMatcher>
<valueMatcher
method="wildcard">
text/*
</valueMatcher>
</restrictTo>
The above example will apply to any content type starting with "text/".
Constructor and Description |
---|
AbstractLinkExtractor() |
Modifier and Type | Method and Description |
---|---|
void |
addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this extractor should be restricted to.
|
void |
addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this extractor should be restricted to.
|
void |
clearRestrictions()
Clears all restrictions.
|
boolean |
equals(Object other) |
Set<Link> |
extractLinks(CrawlDoc doc) |
abstract void |
extractLinks(Set<Link> links,
CrawlDoc doc) |
PropertyMatchers |
getRestrictions()
Gets all restrictions
|
int |
hashCode() |
void |
loadFromXML(XML xml) |
protected abstract void |
loadLinkExtractorFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
boolean |
removeRestriction(PropertyMatcher restriction)
Removes a restriction.
|
int |
removeRestriction(String field)
Removes all restrictions on a given field.
|
protected abstract void |
saveLinkExtractorToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
saveToXML(XML xml) |
void |
setRestrictions(List<PropertyMatcher> restrictions)
Sets restrictions this extractor should be restricted to.
|
String |
toString() |
public final Set<Link> extractLinks(CrawlDoc doc) throws IOException
extractLinks
in interface ILinkExtractor
IOException
public abstract void extractLinks(Set<Link> links, CrawlDoc doc) throws IOException
IOException
public void addRestriction(PropertyMatcher... restrictions)
restrictions
- the restrictionspublic void addRestrictions(List<PropertyMatcher> restrictions)
restrictions
- the restrictionspublic void setRestrictions(List<PropertyMatcher> restrictions)
restrictions
- the restrictionspublic int removeRestriction(String field)
field
- the field to remove restrictions onpublic boolean removeRestriction(PropertyMatcher restriction)
restriction
- the restriction to removetrue
if this extractor contained the restrictionpublic void clearRestrictions()
public PropertyMatchers getRestrictions()
public final void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
protected abstract void loadLinkExtractorFromXML(XML xml)
xml
- XML configurationpublic final void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
protected abstract void saveLinkExtractorToXML(XML xml)
xml
- the XMLCopyright © 2009–2023 Norconex Inc.. All rights reserved.