Class | Description |
---|---|
CharacterCaseTagger |
Changes the character case of field values according to one of the
following methods:
|
CharsetTagger |
Converts one or more field values (if needed) from a source character
encoding (charset) to a target one.
|
ConstantTagger |
Define and add constant values to documents.
|
CopyTagger |
Copies metadata fields.
|
CountMatchesTagger |
Counts the number of matches of a given string (or string pattern) and
store the resulting value in a field in the specified "toField".
|
CountMatchesTagger.MatchDetails | |
CurrentDateTagger |
Adds the current computer UTC date to the specified
field . |
DateFormatTagger |
Formats a date from any given format to a format of choice, as per the
formatting options found on
SimpleDateFormat with the exception
of the string "EPOCH" which represents the difference, measured in
milliseconds, between the date and midnight, January 1, 1970. |
DebugTagger |
A utility tagger to help with troubleshooting of document importing.
|
DeleteTagger |
Delete the metadata fields provided.
|
DocumentLengthTagger |
Adds the document length (i.e., number of bytes) to
the specified
field . |
DOMTagger |
Extract the value of one or more elements or attributes into
a target field, from and HTML, XHTML, or XML document.
|
DOMTagger.DOMExtractDetails |
DOM Extraction Details
|
ExternalTagger |
Extracts metadata from a document using an external application to do so.
|
FieldReportTagger |
A utility tagger that reports in a CSV file the fields discovered
in a crawl session, captured at the point of your choice in the
importing process.
|
ForceSingleValueTagger |
Forces a metadata field to be single-value.
|
HierarchyTagger |
Given a separator, split a field string into multiple segments
representing each node of a hierarchical branch.
|
HierarchyTagger.HierarchyDetails | |
KeepOnlyTagger |
Keep only the metadata fields provided, delete all other ones.
|
LanguageTagger |
Detects a document language based on Tika language detection capability.
|
MergeTagger |
Merge multiple metadata fields into a single one.
|
MergeTagger.Merge | |
RenameTagger |
Rename metadata fields to different names.
|
RenameTagger.RenameDetails | |
ReplaceTagger |
Replaces an existing metadata value with another one.
|
ReplaceTagger.Replacement | |
ScriptTagger |
Tag incoming documents using a scripting language.
|
SplitTagger |
Splits an existing metadata value into multiple values based on a given
value separator.
|
SplitTagger.Split | |
TextBetweenTagger |
Extracts and add values found between a matching start and
end strings to a document metadata field.
|
TextPatternTagger |
Extracts and add all text values matching the regular expression provided
in to a field provided explicitely, or also matching a regular
expression.
|
TextStatisticsTagger |
Analyzes the content of the supplied document and adds statistical
information about its content or field as metadata fields.
|
TitleGeneratorTagger |
Attempts to generate a title from the document content (default) or
a specified metadata field.
|
TruncateTagger |
Truncates a
fromField value(s) and optionally replace truncated
portion by a hash value to help ensure uniqueness (not 100% guaranteed to
be collision-free). |
UUIDTagger |
Generates a random Universally unique identifier (UUID) and stores it
in the specified
field . |
Enum | Description |
---|---|
ConstantTagger.OnConflict |
Copyright © 2009–2021 Norconex Inc.. All rights reserved.