Package com.norconex.collector.http.link
Interface ILinkExtractor
-
- All Known Implementing Classes:
AbstractLinkExtractor
,AbstractTextLinkExtractor
,DOMLinkExtractor
,GenericLinkExtractor
,HtmlLinkExtractor
,RegexLinkExtractor
,TikaLinkExtractor
,XMLFeedLinkExtractor
public interface ILinkExtractor
Responsible for finding links in documents. Links are URLs to be followed with possibly contextual information about that URL (the "a" tag attributes, and text).
Implementing classes also implementingIXMLConfigurable
should make sure to name their XML tag "extractor
", normally nested inlinkExtractors
tags.- Author:
- Pascal Essiembre
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Set<Link>
extractLinks(CrawlDoc doc)
-
-
-
Method Detail
-
extractLinks
Set<Link> extractLinks(CrawlDoc doc) throws IOException
- Throws:
IOException
-
-