Package com.norconex.collector.http.link
Interface ILinkExtractor
-
- All Known Implementing Classes:
AbstractLinkExtractor,AbstractTextLinkExtractor,DOMLinkExtractor,GenericLinkExtractor,HtmlLinkExtractor,RegexLinkExtractor,TikaLinkExtractor,XMLFeedLinkExtractor
public interface ILinkExtractorResponsible for finding links in documents. Links are URLs to be followed with possibly contextual information about that URL (the "a" tag attributes, and text).
Implementing classes also implementingIXMLConfigurableshould make sure to name their XML tag "extractor", normally nested inlinkExtractorstags.- Author:
- Pascal Essiembre
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Set<Link>extractLinks(CrawlDoc doc)
-
-
-
Method Detail
-
extractLinks
Set<Link> extractLinks(CrawlDoc doc) throws IOException
- Throws:
IOException
-
-