Package | Description |
---|---|
com.norconex.collector.http.link | |
com.norconex.collector.http.link.impl |
Modifier and Type | Class and Description |
---|---|
class |
AbstractTextLinkExtractor
Base class for link extraction from text documents, providing common
configuration settings such as being able to apply extraction to specific
documents only, and being able to specify one or more metadata fields
from which to grab the text for extracting links.
|
Modifier and Type | Class and Description |
---|---|
class |
DOMLinkExtractor
Extracts links from a Document Object Model (DOM) representation of an
HTML, XHTML, or XML document content based on values of matching
elements and attributes.
|
class |
GenericLinkExtractor
Deprecated.
Since 3.0.0, use
HtmlLinkExtractor
or DOMLinkExtractor instead. |
class |
HtmlLinkExtractor
Html link extractor for URLs found in HTML and possibly other text files.
|
class |
RegexLinkExtractor
Link extractor using regular expressions to extract links found in text
documents.
|
class |
TikaLinkExtractor
Implementation of
ILinkExtractor using
Apache Tika to perform URL
extractions from HTML documents. |
class |
XMLFeedLinkExtractor
|
Copyright © 2009–2023 Norconex Inc.. All rights reserved.