Modifier and Type | Method and Description |
---|---|
ParseState |
ImporterEvent.getParseState() |
Modifier and Type | Method and Description |
---|---|
ImporterEvent.Builder |
ImporterEvent.Builder.parseState(ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
ParseState |
HandlerContext.getParseState() |
Modifier and Type | Method and Description |
---|---|
protected String |
AbstractImporterHandler.detectCharsetIfBlank(HandlerDoc doc,
InputStream is,
String charset,
ParseState parseState)
Deprecated.
Since 3.0.0, charset was already detected or use
CharsetUtil.firstNonBlankOrUTF8(ParseState, String...) |
protected boolean |
AbstractImporterHandler.isApplicable(HandlerDoc doc,
ParseState parseState)
Class to invoke by subclasses to find out if this handler should be
rejected or not based on the metadata restriction provided.
|
Constructor and Description |
---|
HandlerContext(Doc doc,
EventManager eventManager,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
boolean |
AbstractCharStreamCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
boolean |
IImporterCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState)
Tests a given document.
|
protected abstract boolean |
AbstractCharStreamCondition.testDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected boolean |
AbstractStringCondition.testDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected abstract boolean |
AbstractStringCondition.testDocument(HandlerDoc doc,
String input,
ParseState parseState,
int sectionIndex) |
Modifier and Type | Method and Description |
---|---|
boolean |
BlankCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
boolean |
DateCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
boolean |
NumericCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
boolean |
ReferenceCondition.testDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
DOMCondition.testDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected boolean |
ScriptCondition.testDocument(HandlerDoc doc,
String input,
ParseState parseState,
int sectionIndex) |
protected boolean |
TextCondition.testDocument(HandlerDoc doc,
String input,
ParseState parseState,
int sectionIndex) |
Modifier and Type | Method and Description |
---|---|
boolean |
AbstractDocumentFilter.acceptDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
boolean |
IDocumentFilter.acceptDocument(HandlerDoc doc,
InputStream input,
ParseState parseState)
Whether to accepts a document.
|
protected boolean |
AbstractCharStreamFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected abstract boolean |
AbstractDocumentFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected abstract boolean |
AbstractStringFilter.isStringContentMatching(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected abstract boolean |
AbstractCharStreamFilter.isTextDocumentMatching(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected boolean |
AbstractStringFilter.isTextDocumentMatching(HandlerDoc doc,
Reader input,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
boolean |
RejectFilter.acceptDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
DateMetadataFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
DOMContentFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState)
Deprecated.
|
protected boolean |
DOMFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
EmptyFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
EmptyMetadataFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState)
Deprecated.
|
protected boolean |
NumericMetadataFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
ReferenceFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected boolean |
RegexMetadataFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState)
Deprecated.
|
protected boolean |
RegexReferenceFilter.isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState)
Deprecated.
|
protected boolean |
RegexContentFilter.isStringContentMatching(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex)
Deprecated.
|
protected boolean |
ScriptFilter.isStringContentMatching(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected boolean |
TextFilter.isStringContentMatching(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
Modifier and Type | Method and Description |
---|---|
protected abstract List<Doc> |
AbstractDocumentSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
List<Doc> |
AbstractDocumentSplitter.splitDocument(HandlerDoc doc,
InputStream docInput,
OutputStream docOutput,
ParseState parseState) |
List<Doc> |
IDocumentSplitter.splitDocument(HandlerDoc doc,
InputStream docInput,
OutputStream docOutput,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
protected List<Doc> |
CsvSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected List<Doc> |
DOMSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected List<Doc> |
PDFPageSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected List<Doc> |
TranslatorSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected List<Doc> |
XMLStreamSplitter.splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
protected void |
AbstractCharStreamTagger.tagApplicableDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected abstract void |
AbstractDocumentTagger.tagApplicableDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
void |
AbstractDocumentTagger.tagDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
void |
IDocumentTagger.tagDocument(HandlerDoc doc,
InputStream input,
ParseState parseState)
Tags a document with extra metadata information.
|
protected abstract void |
AbstractStringTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected abstract void |
AbstractCharStreamTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected void |
AbstractStringTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
void |
CharacterCaseTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
CharsetTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
ConstantTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
CopyTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
CurrentDateTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
DateFormatTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
DebugTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
DeleteTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
DocumentLengthTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
DOMTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
ExternalTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
FieldReportTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
ForceSingleValueTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
HierarchyTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
KeepOnlyTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
MergeTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
RenameTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
ReplaceTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
TruncateTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
void |
UUIDTagger.tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
protected void |
LanguageTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
RegexTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
ScriptTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
TextBetweenTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
TextPatternTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex)
Deprecated.
|
protected void |
TitleGeneratorTagger.tagStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
CountMatchesTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected void |
SplitTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected void |
TextStatisticsTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected void |
URLExtractorTagger.tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
protected void |
AbstractCharStreamTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected abstract void |
AbstractDocumentTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
void |
AbstractDocumentTransformer.transformDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
void |
IDocumentTransformer.transformDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState)
Transforms document content and metadata.
|
protected abstract void |
AbstractStringTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected abstract void |
AbstractCharStreamTransformer.transformTextDocument(HandlerDoc doc,
Reader input,
Writer output,
ParseState parseState) |
protected void |
AbstractStringTransformer.transformTextDocument(HandlerDoc doc,
Reader input,
Writer output,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
protected void |
CharsetTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected void |
DOMDeleteTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream document,
OutputStream output,
ParseState parseState) |
protected void |
DOMPreserveTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream document,
OutputStream output,
ParseState parseState) |
protected void |
ExternalTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected void |
ImageTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected void |
NoContentTransformer.transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected void |
ReduceConsecutivesTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
ReplaceTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
ScriptTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
StripAfterTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
StripBeforeTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
StripBetweenTransformer.transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
SubstringTransformer.transformTextDocument(HandlerDoc doc,
Reader input,
Writer output,
ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
static ParseState |
ParseState.valueOf(String name)
Returns the enum constant of this type with the specified name.
|
static ParseState[] |
ParseState.values()
Returns an array containing the constants of this enum type, in
the order they are declared.
|
Modifier and Type | Method and Description |
---|---|
static boolean |
ParseState.isPost(ParseState parseState) |
static boolean |
ParseState.isPre(ParseState parseState) |
Modifier and Type | Method and Description |
---|---|
static String |
CharsetUtil.firstNonBlankOrUTF8(ParseState parseState,
String... charsets)
Returns the first non-blank character encoding, or returns UTF-8 if they
are all blank or in post-parse state.
|
Copyright © 2009–2023 Norconex Inc.. All rights reserved.