Package com.norconex.importer.parser
Interface IDocumentParser
-
- All Known Subinterfaces:
IHintsAwareParser
- All Known Implementing Classes:
AbstractTikaParser
,ExternalParser
,FallbackParser
,XFDLParser
public interface IDocumentParser
Implementations are responsible for parsing a document to extract its text and metadata, as well as any embedded documents (when applicable).- Author:
- Pascal Essiembre
- See Also:
IDocumentParserFactory
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description List<Doc>
parseDocument(Doc doc, Writer output)
Parses a document.
-
-
-
Method Detail
-
parseDocument
List<Doc> parseDocument(Doc doc, Writer output) throws DocumentParserException
Parses a document.- Parameters:
doc
- importer document to parseoutput
- where to store extracted or modified content of the supplied document- Returns:
- a list of first-level embedded documents, if any
- Throws:
DocumentParserException
- problem parsing document
-
-