All Classes (Norconex Importer 3.1.0 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class	Description
AbstractCharStreamCondition	Base class for conditions dealing with the document content as text.
AbstractCharStreamFilter	Base class for filters dealing with the body of text documents only.
AbstractCharStreamTagger	Base class for taggers dealing with the body of text documents only.
AbstractCharStreamTransformer	Base class for transformers dealing with text documents only.
AbstractDocumentFilter	Base class for document filters.
AbstractDocumentSplitter	Base class for splitters.
AbstractDocumentTagger	Base class for taggers.
AbstractDocumentTransformer	Base class for transformers.
AbstractImporterHandler	Base class for handlers applying only to certain type of documents by providing a way to restrict applicable documents based on a metadata field value, where the value matches a regular expression.
AbstractOnMatchFilter	Deprecated. Since 3.0.0, use composition with OnMatch instead
AbstractStringCondition	Base class to facilitate creating conditions based on text content, loading text into `StringBuilder` for memory processing.
AbstractStringFilter	Base class to facilitate creating filters based on text content, loading text into `StringBuilder` for memory processing.
AbstractStringTagger	Base class to facilitate creating taggers based on text content, loading text into `StringBuilder` for memory processing.
AbstractStringTransformer	Base class to facilitate creating transformers on text content, loading text into a `StringBuilder` for memory processing.
AbstractTikaParser	Base class wrapping Apache Tika parser for use by the importer.
AbstractTikaParser.RecursiveParser
BlankCondition	A condition based on whether the document content (default) or any of the specified metadata fields are blank or inexistent.
BufferUtil	Buffer related utility methods.
CharacterCaseTagger	Changes the character case of matching fields and values according to one of the following methods:
CharsetTagger	Converts one or more field values (if needed) from a source character encoding (charset) to a target one.
CharsetTransformer	Transforms a document content (if needed) from a source character encoding (charset) to a target one.
CharsetUtil	Character set utility methods.
CommonMatchers	Commonly used `TextMatcher` instances.
CommonRestrictions	Commonly encountered restrictions that can be applied to `Properties` instances.
ConstantTagger	Define and add constant values to documents.
ConstantTagger.OnConflict	Deprecated.
ContentTypeDetector	Master class to detect all content types.
CopyTagger	Copies metadata fields.
CountMatchesTagger	Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".
CountMatchesTagger.MatchDetails	Deprecated.
CsvSplitter	Split files with Coma-Separated values (or any other characters, like tab) into one document per line.
CurrentDateTagger	Adds the current computer UTC date to the specified `field`.
DateCondition	A condition based on the date value(s) of matching metadata fields given the supplied date format.
DateCondition.DynamicFixedDateTimeSupplier
DateCondition.DynamicFloatingDateTimeSupplier
DateCondition.StaticDateTimeSupplier
DateCondition.TimeUnit
DateCondition.ValueMatcher
DateFormatTagger	Formats a date from any given format to a format of choice, as per the formatting options found on `SimpleDateFormat` with the exception of the string "EPOCH" which represents the difference, measured in milliseconds, between the date and midnight, January 1, 1970.
DateMetadataFilter	Accepts or rejects a document based on whether field values correspond to a date matching supplied conditions and format.
DateMetadataFilter.Condition
DateMetadataFilter.DynamicFixedDateTimeSupplier
DateMetadataFilter.DynamicFloatingDateTimeSupplier
DateMetadataFilter.Operator
DateMetadataFilter.StaticDateTimeSupplier
DateMetadataFilter.TimeUnit
DebugTagger	A utility tagger to help with troubleshooting of document importing.
DeleteTagger	Delete the metadata fields provided.
Doc	A document being imported.
DocInfo	Important information about a document that has specific meaning and purpose for processing by the Importer and needs to be referenced in a constant way.
DocMetadata	Constants for common metadata field names typically associated with a document and often set on `Doc.getMetadata()`.
DocumentLengthTagger	Adds the document length (i.e., number of bytes) to the specified `field`.
DocumentParserException	Exception thrown upon encountering a non-recoverable issue parsing a document.
DOMCondition	A condition using a Document Object Model (DOM) representation of an HTML, XHTML, or XML document content to match an element, attribute or value.
DOMContentFilter	Deprecated. Since 3.0.0, use `DOMFilter`.
DOMDeleteTransformer	Enables deletion of one or more elements matching a given selector from a document content.
DOMFilter	Uses a Document Object Model (DOM) representation of an HTML, XHTML, or XML document content to perform filtering based on matching an element/attribute or element/attribute value.
DOMPreserveTransformer	Preserves only one or more elements matching a given selector from a document content.
DOMPreserveTransformer.DOMExtractDetails	DOM Extraction Details
DOMSplitter	Splits HTML, XHTML, or XML document on elements matching a given selector.
DOMTagger	Extract the value of one or more elements or attributes into a target field, or delete matching elements.
DOMTagger.DOMExtractDetails	DOM Extraction Details
DOMUtil	Utility methods related to JSoup/DOM manipulation.
EmbeddedConfig	Configuration settings affecting how embedded documents are handled by parsers.
EmptyFilter	Accepts or rejects a document based on whether its content (default) or any of the specified metadata fields are empty or not.
EmptyMetadataFilter	Deprecated. Since 3.0.0, use `EmptyFilter`.
ExternalHandler	Class executing an external application to extract data from and/or manipulate a document.
ExternalParser	Parses and extracts text from a file using an external application to do so.
ExternalTagger	Extracts metadata from a document using an external application to do so.
ExternalTransformer	Transforms a document using an external application to do so.
FallbackParser	Parser using auto-detection of document content-type to figure out which specific parser to invoke to best parse a document.
FieldReportTagger	A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process.
ForceSingleValueTagger	Forces a metadata field to be single-value.
FormatUtil	Utility methods related to formatting.
GenericDocumentParserFactory	Generic document parser factory.
HandlerConsumer	Consumer wrapping an `IImporterHandler` instance for use in an `XMLFlow`.
HandlerContext
HandlerContext.IncludeMatchResolver
HandlerDoc	Lighter version of `Doc` which leaves content out to let each handler dictate how content should be referenced.
HandlerPredicate	Predicate wrapping an `IImporterCondition` instance for use in an `XMLFlow`.
HierarchyTagger	Given a separator, split a field string into multiple segments representing each node of a hierarchical branch.
HierarchyTagger.HierarchyDetails
IDocumentFilter	Filters documents.
IDocumentParser	Implementations are responsible for parsing a document to extract its text and metadata, as well as any embedded documents (when applicable).
IDocumentParserFactory	Factory providing document parsers for documents.
IDocumentSplitter	Responsible for splitting a single document into several ones.
IDocumentTagger	Tags a document with extra metadata information, or manipulate existing metadata information.
IDocumentTransformer	Transformers allow to manipulate and modify a document metadata or content.
IHintsAwareParser	Indicates that a parser can be initialized with generic parser configuration settings and it will try to apply any such settings the best it can when possible to do so.
IImporterCondition	A condition usually used in XML flow creation when configuring importer handlers.
IImporterHandler	Identifies a class as being an import handler.
IImporterResponseProcessor	Processes an importer response to modify it or perform other actions as required before it is returned.
ImageTransformer	Transforms an image using common image operations.
Importer	Principal class responsible for importing documents.
ImporterConfig	Importer configuration.
ImporterEvent	An Importer event.
ImporterEvent.Builder
ImporterException	Exception thrown when an issue prevented the proper importation of a file.
ImporterHandlerException	Exception thrown by several handler classes upon encountering issues.
ImporterLauncher	Command line launcher of the Importer application.
ImporterRequest	An Importer request, unique for each document to be imported.
ImporterResponse
ImporterRuntimeException	RuntimeException thrown when a an issue prevented the proper importation of a file.
ImporterStatus
ImporterStatus.Status
IOnMatchFilter	Tells the collector that a filter is of "OnMatch" type.
KeepOnlyTagger	Keep only the metadata fields provided, delete all other ones.
LanguageTagger	Detects a document language based on Apache Tika language detection capability.
MergeTagger	Merge multiple metadata fields into a single one.
MergeTagger.Merge
NoContentTransformer	Get rid of the content stream and optionally store it as text into a metadata field instead.
NumericCondition	A condition based on the numeric value(s) of matching metadata fields, supporting decimals.
NumericCondition.ValueMatcher
NumericMetadataFilter	Accepts or rejects a document based on the numeric value(s) of matching metadata fields, supporting decimals.
NumericMetadataFilter.Condition
NumericMetadataFilter.Operator
OCRConfig	OCR configuration details.
OnMatch	Constants indicating the action to perform upon matching a condition.
ParseHints	Configuration settings influencing how documents are parsed by various parsers.
ParseState	Act as a flag indicating if a document has been parsed or not in a given process flow.
PDFPageSplitter	Split PDFs pages so each pages are treated as individual documents.
ReduceConsecutivesTransformer	Reduces specified consecutive characters or strings to only one instance (document content only).
ReferenceCondition	A condition based on a text pattern matching a document reference (e.g.
ReferenceFilter	Accepts or rejects a document based on its reference (e.g.
RegexContentFilter	Deprecated. Since 3.0.0, use `TextFilter` instead.
RegexFieldExtractor	Deprecated. Since 3.0.0, use `RegexFieldValueExtractor` from Norconex Commons Lang
RegexMetadataFilter	Deprecated. Since 3.0.0, use `TextFilter` instead.
RegexReferenceFilter	Deprecated. Since 3.0.0, use `ReferenceFilter` instead.
RegexTagger	Extracts field names and their values with regular expression.
RegexUtil	Deprecated. Since 3.0.0, use `RegexFieldValueExtractor` from Norconex Commons Lang
RejectFilter	Rejects a document.
RenameTagger	Rename metadata fields to different names.
RenameTagger.RenameDetails
ReplaceTagger	Replaces an existing metadata value with another one.
ReplaceTagger.Replacement
ReplaceTransformer	Replaces every occurrences of the given replacements (document content only).
ReplaceTransformer.Replacement
ScriptCondition	A condition formulated using a scripting language.
ScriptFilter	Filter incoming documents using a scripting language.
ScriptRunner<T>	Runs scripts written in a programming language supported by the provided script engine.
ScriptTagger	Tag incoming documents using a scripting language.
ScriptTransformer	Transform incoming documents using a scripting language.
SplitTagger	Splits an existing metadata value into multiple values based on a given value separator (the separator gets discarded).
SplitTagger.SplitDetails
StripAfterTransformer	Strips any content found after first match found for given pattern.
StripBeforeTransformer	Strips any content found before first match found for given pattern.
StripBetweenTransformer	Strips any content found between a matching start and end strings.
StripBetweenTransformer.StripBetweenDetails
SubstringTransformer	Keep a substring of the content matching a begin and end character indexes.
TextBetweenTagger	Extracts and add values found between a matching start and end strings to a document metadata field.
TextBetweenTagger.TextBetweenDetails
TextCondition	A condition based on a text pattern matching a document content (default), or matching specific field(s).
TextFilter	Filters a document based on a text pattern in a document content (default), or matching fields specified.
TextPatternTagger	Deprecated. Since 3.0.0, use `RegexTagger`.
TextStatisticsTagger	Analyzes the content of the supplied document and adds statistical information about its content or field as metadata fields.
TitleGeneratorTagger	Attempts to generate a title from the document content (default) or a specified metadata field.
TranslatorSplitter	Translate documents using one of the supported translation API.
TruncateTagger	Truncates a `fromField` value(s) and optionally replace truncated portion by a hash value to help ensure uniqueness (not 100% guaranteed to be collision-free).
URLExtractorTagger	Extracts unique URLs matching specific patterns in plain text content and store them in a given field.
UUIDTagger	Generates a random Universally unique identifier (UUID) and stores it in the specified `field`.
XFDLParser	Parser for PureEdge Extensible Forms Description Language (XFDL).
XMLStreamSplitter	Splits XML document on a specific element.