public class AbstractTikaParser extends Object implements IHintsAwareParser
Modifier and Type | Class and Description |
---|---|
protected class |
AbstractTikaParser.MergeEmbeddedParser |
protected static interface |
AbstractTikaParser.RecursiveParser |
protected class |
AbstractTikaParser.SplitEmbbededParser |
Constructor and Description |
---|
AbstractTikaParser(Parser parser)
Creates a new Tika-based parser.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addTikaMetadataToImporterMetadata(Metadata tikaMeta,
ImporterMetadata metadata) |
protected AbstractTikaParser.RecursiveParser |
createRecursiveParser(String reference,
String contentType,
Writer writer,
ImporterMetadata metadata,
CachedStreamFactory streamFactory) |
boolean |
equals(Object other) |
OCRConfig |
getOCRConfig()
Deprecated.
|
int |
hashCode() |
void |
initialize(ParseHints parserHints)
Initialize this parser with the given parse hints.
|
boolean |
isSplitEmbedded()
Deprecated.
|
protected void |
modifyParseContext(ParseContext parseContext)
Override to apply your own settings on the Tika ParseContext.
|
List<ImporterDocument> |
parseDocument(ImporterDocument doc,
Writer output)
Parses a document.
|
void |
setOCRConfig(OCRConfig ocrConfig)
Deprecated.
|
void |
setSplitEmbedded(boolean splitEmbedded)
Deprecated.
|
String |
toString() |
public AbstractTikaParser(Parser parser)
parser
- Tika parserpublic void initialize(ParseHints parserHints)
IHintsAwareParser
initialize
in interface IHintsAwareParser
parserHints
- configuration settings influencing parsing when
possible or appropriatepublic final List<ImporterDocument> parseDocument(ImporterDocument doc, Writer output) throws DocumentParserException
IDocumentParser
parseDocument
in interface IDocumentParser
doc
- importer document to parseoutput
- where to store extracted or modified content of the
supplied documentDocumentParserException
- problem parsing documentprotected void modifyParseContext(ParseContext parseContext)
parseContext
- Tika parse contextprotected void addTikaMetadataToImporterMetadata(Metadata tikaMeta, ImporterMetadata metadata)
protected AbstractTikaParser.RecursiveParser createRecursiveParser(String reference, String contentType, Writer writer, ImporterMetadata metadata, CachedStreamFactory streamFactory)
@Deprecated public void setOCRConfig(OCRConfig ocrConfig)
initialize(ParseHints)
ocrConfig
- the ocrConfig to set@Deprecated public OCRConfig getOCRConfig()
initialize(ParseHints)
@Deprecated public boolean isSplitEmbedded()
initialize(ParseHints)
true
if parser should split embedded documents.@Deprecated public void setSplitEmbedded(boolean splitEmbedded)
initialize(ParseHints)
splitEmbedded
- true
if parser should split
embedded documents.Copyright © 2009–2021 Norconex Inc.. All rights reserved.