All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class |
Description |
AbstractCharStreamCondition |
Base class for conditions dealing with the document content as text.
|
AbstractCharStreamFilter |
Base class for filters dealing with the body of text documents only.
|
AbstractCharStreamTagger |
Base class for taggers dealing with the body of text documents only.
|
AbstractCharStreamTransformer |
Base class for transformers dealing with text documents only.
|
AbstractDocumentFilter |
Base class for document filters.
|
AbstractDocumentSplitter |
Base class for splitters.
|
AbstractDocumentTagger |
Base class for taggers.
|
AbstractDocumentTransformer |
Base class for transformers.
|
AbstractImporterHandler |
Base class for handlers applying only to certain type of documents
by providing a way to restrict applicable documents based on
a metadata field value, where the value matches a regular expression.
|
AbstractOnMatchFilter |
Deprecated.
|
AbstractStringCondition |
Base class to facilitate creating conditions based on text content,
loading text into StringBuilder for memory processing.
|
AbstractStringFilter |
Base class to facilitate creating filters based on text content, loading
text into StringBuilder for memory processing.
|
AbstractStringTagger |
Base class to facilitate creating taggers based on text content, loading
text into StringBuilder for memory processing.
|
AbstractStringTransformer |
Base class to facilitate creating transformers on text content, loading
text into a StringBuilder for memory processing.
|
AbstractTikaParser |
Base class wrapping Apache Tika parser for use by the importer.
|
AbstractTikaParser.RecursiveParser |
|
BlankCondition |
A condition based on whether the document content (default) or
any of the specified metadata fields are blank or inexistent.
|
BufferUtil |
Buffer related utility methods.
|
CharacterCaseTagger |
Changes the character case of matching fields and values according to
one of the following methods:
|
CharsetTagger |
Converts one or more field values (if needed) from a source character
encoding (charset) to a target one.
|
CharsetTransformer |
Transforms a document content (if needed) from a source character
encoding (charset) to a target one.
|
CharsetUtil |
Character set utility methods.
|
CommonMatchers |
|
CommonRestrictions |
Commonly encountered restrictions that can be applied to Properties
instances.
|
ConstantTagger |
Define and add constant values to documents.
|
ConstantTagger.OnConflict |
Deprecated. |
ContentTypeDetector |
Master class to detect all content types.
|
CopyTagger |
Copies metadata fields.
|
CountMatchesTagger |
Counts the number of matches of a given string (or string pattern) and
store the resulting value in a field in the specified "toField".
|
CountMatchesTagger.MatchDetails |
Deprecated. |
CsvSplitter |
Split files with Coma-Separated values (or any other characters, like tab)
into one document per line.
|
CurrentDateTagger |
Adds the current computer UTC date to the specified field .
|
DateCondition |
A condition based on the date value(s) of matching
metadata fields given the supplied date format.
|
DateCondition.DynamicFixedDateTimeSupplier |
|
DateCondition.DynamicFloatingDateTimeSupplier |
|
DateCondition.StaticDateTimeSupplier |
|
DateCondition.TimeUnit |
|
DateCondition.ValueMatcher |
|
DateFormatTagger |
Formats a date from any given format to a format of choice, as per the
formatting options found on SimpleDateFormat with the exception
of the string "EPOCH" which represents the difference, measured in
milliseconds, between the date and midnight, January 1, 1970.
|
DateMetadataFilter |
Accepts or rejects a document based on whether field values correspond
to a date matching supplied conditions and format.
|
DateMetadataFilter.Condition |
|
DateMetadataFilter.DynamicFixedDateTimeSupplier |
|
DateMetadataFilter.DynamicFloatingDateTimeSupplier |
|
DateMetadataFilter.Operator |
|
DateMetadataFilter.StaticDateTimeSupplier |
|
DateMetadataFilter.TimeUnit |
|
DebugTagger |
A utility tagger to help with troubleshooting of document importing.
|
DeleteTagger |
Delete the metadata fields provided.
|
Doc |
A document being imported.
|
DocInfo |
Important information about a document that has specific meaning and purpose
for processing by the Importer and needs to be referenced in a constant way.
|
DocMetadata |
Constants for common metadata field names typically associated
with a document and often set on Doc.getMetadata() .
|
DocumentLengthTagger |
Adds the document length (i.e., number of bytes) to
the specified field .
|
DocumentParserException |
Exception thrown upon encountering a non-recoverable issue parsing a
document.
|
DOMCondition |
A condition using a Document Object Model (DOM) representation of an HTML,
XHTML, or XML document content to match an element, attribute or value.
|
DOMContentFilter |
Deprecated.
|
DOMDeleteTransformer |
Enables deletion of one or more elements matching a given selector
from a document content.
|
DOMFilter |
Uses a Document Object Model (DOM) representation of an HTML, XHTML, or
XML document content to perform filtering based on matching an
element/attribute or element/attribute value.
|
DOMPreserveTransformer |
Preserves only one or more elements matching a given selector from
a document content.
|
DOMPreserveTransformer.DOMExtractDetails |
DOM Extraction Details
|
DOMSplitter |
Splits HTML, XHTML, or XML document on elements matching a given
selector.
|
DOMTagger |
Extract the value of one or more elements or attributes into
a target field, or delete matching elements.
|
DOMTagger.DOMExtractDetails |
DOM Extraction Details
|
DOMUtil |
Utility methods related to JSoup/DOM manipulation.
|
EmbeddedConfig |
Configuration settings affecting how embedded documents are handled
by parsers.
|
EmptyFilter |
Accepts or rejects a document based on whether its content (default) or
any of the specified metadata fields are empty or not.
|
EmptyMetadataFilter |
Deprecated.
|
ExternalHandler |
Class executing an external application
to extract data from and/or manipulate a document.
|
ExternalParser |
Parses and extracts text from a file using an external application to do so.
|
ExternalTagger |
Extracts metadata from a document using an external application to do so.
|
ExternalTransformer |
Transforms a document using an external application to do so.
|
FallbackParser |
Parser using auto-detection of document content-type to figure out
which specific parser to invoke to best parse a document.
|
FieldReportTagger |
A utility tagger that reports in a CSV file the fields discovered
in a crawl session, captured at the point of your choice in the
importing process.
|
ForceSingleValueTagger |
Forces a metadata field to be single-value.
|
FormatUtil |
Utility methods related to formatting.
|
GenericDocumentParserFactory |
Generic document parser factory.
|
HandlerConsumer |
|
HandlerContext |
|
HandlerContext.IncludeMatchResolver |
|
HandlerDoc |
Lighter version of Doc which leaves content out to let each
handler dictate how content should be referenced.
|
HandlerPredicate |
|
HierarchyTagger |
Given a separator, split a field string into multiple segments
representing each node of a hierarchical branch.
|
HierarchyTagger.HierarchyDetails |
|
IDocumentFilter |
Filters documents.
|
IDocumentParser |
Implementations are responsible for parsing a document to
extract its text and metadata, as well as any embedded documents
(when applicable).
|
IDocumentParserFactory |
Factory providing document parsers for documents.
|
IDocumentSplitter |
Responsible for splitting a single document into several ones.
|
IDocumentTagger |
Tags a document with extra metadata information, or manipulate existing
metadata information.
|
IDocumentTransformer |
Transformers allow to manipulate and modify a document metadata or content.
|
IHintsAwareParser |
Indicates that a parser can be initialized with generic parser configuration
settings and it will try to apply any such settings the best it can
when possible to do so.
|
IImporterCondition |
A condition usually used in XML flow creation when configuring
importer handlers.
|
IImporterHandler |
Identifies a class as being an import handler.
|
IImporterResponseProcessor |
Processes an importer response to modify it or perform other actions
as required before it is returned.
|
ImageTransformer |
Transforms an image using common image operations.
|
Importer |
Principal class responsible for importing documents.
|
ImporterConfig |
Importer configuration.
|
ImporterEvent |
An Importer event.
|
ImporterEvent.Builder |
|
ImporterException |
Exception thrown when an issue prevented the proper importation of a file.
|
ImporterHandlerException |
Exception thrown by several handler classes upon encountering
issues.
|
ImporterLauncher |
Command line launcher of the Importer application.
|
ImporterRequest |
An Importer request, unique for each document to be imported.
|
ImporterResponse |
|
ImporterRuntimeException |
RuntimeException thrown when a an issue prevented the proper importation of a
file.
|
ImporterStatus |
|
ImporterStatus.Status |
|
IOnMatchFilter |
Tells the collector that a filter is of "OnMatch" type.
|
KeepOnlyTagger |
Keep only the metadata fields provided, delete all other ones.
|
LanguageTagger |
Detects a document language based on Apache Tika language detection
capability.
|
MergeTagger |
Merge multiple metadata fields into a single one.
|
MergeTagger.Merge |
|
NoContentTransformer |
Get rid of the content stream and optionally store it as text into a
metadata field instead.
|
NumericCondition |
A condition based on the numeric value(s) of matching
metadata fields, supporting decimals.
|
NumericCondition.ValueMatcher |
|
NumericMetadataFilter |
Accepts or rejects a document based on the numeric value(s) of matching
metadata fields, supporting decimals.
|
NumericMetadataFilter.Condition |
|
NumericMetadataFilter.Operator |
|
OCRConfig |
OCR configuration details.
|
OnMatch |
Constants indicating the action to perform upon matching a condition.
|
ParseHints |
Configuration settings influencing how documents are parsed by various
parsers.
|
ParseState |
Act as a flag indicating if a document has been parsed or not in
a given process flow.
|
PDFPageSplitter |
Split PDFs pages so each pages are treated as individual documents.
|
ReduceConsecutivesTransformer |
Reduces specified consecutive characters or strings to only one
instance (document content only).
|
ReferenceCondition |
A condition based on a text pattern matching a document reference (e.g.
|
ReferenceFilter |
Accepts or rejects a document based on its reference (e.g.
|
RegexContentFilter |
Deprecated.
|
RegexFieldExtractor |
Deprecated.
|
RegexMetadataFilter |
Deprecated.
|
RegexReferenceFilter |
Deprecated.
|
RegexTagger |
Extracts field names and their values with regular expression.
|
RegexUtil |
Deprecated.
|
RejectFilter |
Rejects a document.
|
RenameTagger |
Rename metadata fields to different names.
|
RenameTagger.RenameDetails |
|
ReplaceTagger |
Replaces an existing metadata value with another one.
|
ReplaceTagger.Replacement |
|
ReplaceTransformer |
Replaces every occurrences of the given replacements
(document content only).
|
ReplaceTransformer.Replacement |
|
ScriptCondition |
A condition formulated using a scripting language.
|
ScriptFilter |
Filter incoming documents using a scripting language.
|
ScriptRunner<T> |
Runs scripts written in a programming language supported by the provided
script engine.
|
ScriptTagger |
Tag incoming documents using a scripting language.
|
ScriptTransformer |
Transform incoming documents using a scripting language.
|
SplitTagger |
Splits an existing metadata value into multiple values based on a given
value separator (the separator gets discarded).
|
SplitTagger.SplitDetails |
|
StripAfterTransformer |
Strips any content found after first match found for given pattern.
|
StripBeforeTransformer |
Strips any content found before first match found for given pattern.
|
StripBetweenTransformer |
Strips any content found between a matching start and end strings.
|
StripBetweenTransformer.StripBetweenDetails |
|
SubstringTransformer |
Keep a substring of the content matching a begin and end character
indexes.
|
TextBetweenTagger |
Extracts and add values found between a matching start and
end strings to a document metadata field.
|
TextBetweenTagger.TextBetweenDetails |
|
TextCondition |
A condition based on a text pattern matching a document content
(default), or matching specific field(s).
|
TextFilter |
Filters a document based on a text pattern in a document content
(default), or matching fields specified.
|
TextPatternTagger |
Deprecated.
|
TextStatisticsTagger |
Analyzes the content of the supplied document and adds statistical
information about its content or field as metadata fields.
|
TitleGeneratorTagger |
Attempts to generate a title from the document content (default) or
a specified metadata field.
|
TranslatorSplitter |
Translate documents using one of the supported translation API.
|
TruncateTagger |
Truncates a fromField value(s) and optionally replace truncated
portion by a hash value to help ensure uniqueness (not 100% guaranteed to
be collision-free).
|
URLExtractorTagger |
Extracts unique URLs matching specific patterns in plain text content and
store them in a given field.
|
UUIDTagger |
Generates a random Universally unique identifier (UUID) and stores it
in the specified field .
|
XFDLParser |
Parser for PureEdge Extensible Forms Description Language (XFDL).
|
XMLStreamSplitter |
Splits XML document on a specific element.
|