java.lang.Object
- com.norconex.collector.http.link.AbstractLinkExtractor
- - com.norconex.collector.http.link.AbstractTextLinkExtractor

All Implemented Interfaces:

ILinkExtractor, IXMLConfigurable

Direct Known Subclasses:

DOMLinkExtractor, HtmlLinkExtractor, RegexLinkExtractor, XMLFeedLinkExtractor
```
public abstract class AbstractTextLinkExtractor
extends AbstractLinkExtractor
```
Base class for link extraction from text documents, providing common configuration settings such as being able to apply extraction to specific documents only, and being able to specify one or more metadata fields from which to grab the text for extracting links.

Not suitable for binary files.

Subclasses inherit the following:

XML configuration usage:
```
<fieldMatcher>
  (optional expression for fields used for links extraction instead
   of the document stream)
</fieldMatcher>
```
XML usage example:
The above will apply to any content type starting with "text/".
Since:

3.0.0

Author:

Pascal Essiembre

Constructor Summary

Constructors
Constructor Description

AbstractTextLinkExtractor()

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`boolean`	`equals(Object other)`
`void`	`extractLinks(Set<Link> links, CrawlDoc doc)`
`abstract void`	`extractTextLinks(Set<Link> links, HandlerDoc doc, Reader reader)`
`TextMatcher`	`getFieldMatcher()`	Gets field matcher identifying fields holding content used for link extraction.
`int`	`hashCode()`
`void`	`loadLinkExtractorFromXML(XML xml)`	Loads configuration settings specific to the implementing class.
`protected abstract void`	`loadTextLinkExtractorFromXML(XML xml)`	Loads configuration settings specific to the implementing class.
`protected void`	`saveLinkExtractorToXML(XML xml)`	Saves configuration settings specific to the implementing class.
`protected abstract void`	`saveTextLinkExtractorToXML(XML xml)`	Saves configuration settings specific to the implementing class.
`void`	`setFieldMatcher(TextMatcher fieldMatcher)`	Gets field matcher identifying fields holding content used for link extraction.
`String`	`toString()`

Methods inherited from class com.norconex.collector.http.link.AbstractLinkExtractor
addRestriction, addRestrictions, clearRestrictions, extractLinks, getRestrictions, loadFromXML, removeRestriction, removeRestriction, saveToXML, setRestrictions

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - AbstractTextLinkExtractor
```
public AbstractTextLinkExtractor()
```
- Method Detail
  - extractLinks
```
public final void extractLinks(Set<Link> links,
                               CrawlDoc doc)
                        throws IOException
```
    Specified by:
    
    extractLinks in class AbstractLinkExtractor
    
    Throws:
    
    IOException
  - extractTextLinks
```
public abstract void extractTextLinks(Set<Link> links,
                                      HandlerDoc doc,
                                      Reader reader)
                               throws IOException
```
    Throws:
    
    IOException
  - getFieldMatcher
```
public TextMatcher getFieldMatcher()
```
    Gets field matcher identifying fields holding content used for link extraction. Default is null, using the document content stream instead.
    
    Returns:
    
    field matcher
  - setFieldMatcher
```
public void setFieldMatcher(TextMatcher fieldMatcher)
```
    Gets field matcher identifying fields holding content used for link extraction. Default is null, using the document content stream instead.
    
    Parameters:
    
    fieldMatcher - field matcher
  - loadLinkExtractorFromXML
```
public final void loadLinkExtractorFromXML(XML xml)
```
    Description copied from class: AbstractLinkExtractor
    
    Loads configuration settings specific to the implementing class.
    
    Specified by:
    
    loadLinkExtractorFromXML in class AbstractLinkExtractor
    
    Parameters:
    
    xml - XML configuration
  - loadTextLinkExtractorFromXML
```
protected abstract void loadTextLinkExtractorFromXML(XML xml)
```
    Loads configuration settings specific to the implementing class.
    
    Parameters:
    
    xml - XML configuration
  - saveLinkExtractorToXML
```
protected final void saveLinkExtractorToXML(XML xml)
```
    Description copied from class: AbstractLinkExtractor
    
    Saves configuration settings specific to the implementing class.
    
    Specified by:
    
    saveLinkExtractorToXML in class AbstractLinkExtractor
    
    Parameters:
    
    xml - the XML
  - saveTextLinkExtractorToXML
```
protected abstract void saveTextLinkExtractorToXML(XML xml)
```
    Saves configuration settings specific to the implementing class.
    
    Parameters:
    
    xml - the XML
  - equals
```
public boolean equals(Object other)
```
    Overrides:
    
    equals in class AbstractLinkExtractor
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class AbstractLinkExtractor
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class AbstractLinkExtractor

Class AbstractTextLinkExtractor

XML configuration usage:

XML usage example:

Constructor Summary

Method Summary

Methods inherited from class com.norconex.collector.http.link.AbstractLinkExtractor

Methods inherited from class java.lang.Object

Constructor Detail

AbstractTextLinkExtractor

Method Detail

extractLinks

extractTextLinks

getFieldMatcher

setFieldMatcher

loadLinkExtractorFromXML

loadTextLinkExtractorFromXML

saveLinkExtractorToXML

saveTextLinkExtractorToXML

equals

hashCode

toString