java.lang.Object
- com.norconex.collector.http.link.AbstractLinkExtractor
- - com.norconex.collector.http.link.impl.TikaLinkExtractor

All Implemented Interfaces:

ILinkExtractor, IXMLConfigurable
```
public class TikaLinkExtractor
extends AbstractLinkExtractor
```
Implementation of ILinkExtractor using Apache Tika to perform URL extractions from HTML documents. This is an alternative to the HtmlLinkExtractor.

The configuration of content-types, storing the referrer data, and ignoring "nofollow" and ignoring link data are the same as in HtmlLinkExtractor. For link data, this parser only keeps a pre-defined set of link attributes, when available (title, type, uri, text, rel).

XML configuration usage:
```
<extractor
    class="com.norconex.collector.http.link.impl.TikaLinkExtractor"
    ignoreNofollow="[false|true]"/>
```
Author:

Pascal Essiembre

See Also:

HtmlLinkExtractor

Constructor Summary

Constructors
Constructor Description

TikaLinkExtractor()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`boolean`	`equals(Object other)`
`void`	`extractLinks(Set<Link> nxLinks, CrawlDoc doc)`
`int`	`hashCode()`
`boolean`	`isIgnoreLinkData()`	Gets whether to ignore extra data associated with a link.
`boolean`	`isIgnoreNofollow()`
`protected void`	`loadLinkExtractorFromXML(XML xml)`	Loads configuration settings specific to the implementing class.
`protected void`	`saveLinkExtractorToXML(XML xml)`	Saves configuration settings specific to the implementing class.
`void`	`setIgnoreLinkData(boolean ignoreLinkData)`	Sets whether to ignore extra data associated with a link.
`void`	`setIgnoreNofollow(boolean ignoreNofollow)`
`String`	`toString()`

Methods inherited from class com.norconex.collector.http.link.AbstractLinkExtractor
addRestriction, addRestrictions, clearRestrictions, extractLinks, getRestrictions, loadFromXML, removeRestriction, removeRestriction, saveToXML, setRestrictions

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - TikaLinkExtractor
```
public TikaLinkExtractor()
```
- Method Detail
  - extractLinks
```
public void extractLinks(Set<Link> nxLinks,
                         CrawlDoc doc)
                  throws IOException
```
    Specified by:
    
    extractLinks in class AbstractLinkExtractor
    
    Throws:
    
    IOException
  - isIgnoreNofollow
```
public boolean isIgnoreNofollow()
```
  - setIgnoreNofollow
```
public void setIgnoreNofollow(boolean ignoreNofollow)
```
  - isIgnoreLinkData
```
public boolean isIgnoreLinkData()
```
    Gets whether to ignore extra data associated with a link.
    
    Returns:
    
    true to ignore.
    
    Since:
    
    3.0.0
  - setIgnoreLinkData
```
public void setIgnoreLinkData(boolean ignoreLinkData)
```
    Sets whether to ignore extra data associated with a link.
    
    Parameters:
    
    ignoreLinkData - true to ignore.
    
    Since:
    
    3.0.0
  - loadLinkExtractorFromXML
```
protected void loadLinkExtractorFromXML(XML xml)
```
    Description copied from class: AbstractLinkExtractor
    
    Loads configuration settings specific to the implementing class.
    
    Specified by:
    
    loadLinkExtractorFromXML in class AbstractLinkExtractor
    
    Parameters:
    
    xml - XML configuration
  - saveLinkExtractorToXML
```
protected void saveLinkExtractorToXML(XML xml)
```
    Description copied from class: AbstractLinkExtractor
    
    Saves configuration settings specific to the implementing class.
    
    Specified by:
    
    saveLinkExtractorToXML in class AbstractLinkExtractor
    
    Parameters:
    
    xml - the XML
  - equals
```
public boolean equals(Object other)
```
    Overrides:
    
    equals in class AbstractLinkExtractor
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class AbstractLinkExtractor
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class AbstractLinkExtractor

Class TikaLinkExtractor

XML configuration usage:

Constructor Summary

Method Summary

Methods inherited from class com.norconex.collector.http.link.AbstractLinkExtractor

Methods inherited from class java.lang.Object

Constructor Detail

TikaLinkExtractor

Method Detail

extractLinks

isIgnoreNofollow

setIgnoreNofollow

isIgnoreLinkData

setIgnoreLinkData

loadLinkExtractorFromXML

saveLinkExtractorToXML

equals

hashCode

toString