Package | Description |
---|---|
com.norconex.importer.handler.tagger | |
com.norconex.importer.handler.tagger.impl |
Modifier and Type | Class and Description |
---|---|
class |
AbstractCharStreamTagger
Base class for taggers dealing with the body of text documents only.
|
class |
AbstractDocumentTagger
Base class for taggers.
|
class |
AbstractStringTagger
Base class to facilitate creating taggers based on text content, loading
text into
StringBuilder for memory processing. |
Modifier and Type | Class and Description |
---|---|
class |
CharacterCaseTagger
Changes the character case of matching fields and values according to
one of the following methods:
|
class |
CharsetTagger
Converts one or more field values (if needed) from a source character
encoding (charset) to a target one.
|
class |
ConstantTagger
Define and add constant values to documents.
|
class |
CopyTagger
Copies metadata fields.
|
class |
CountMatchesTagger
Counts the number of matches of a given string (or string pattern) and
store the resulting value in a field in the specified "toField".
|
class |
CurrentDateTagger
Adds the current computer UTC date to the specified
field . |
class |
DateFormatTagger
Formats a date from any given format to a format of choice, as per the
formatting options found on
SimpleDateFormat with the exception
of the string "EPOCH" which represents the difference, measured in
milliseconds, between the date and midnight, January 1, 1970. |
class |
DebugTagger
A utility tagger to help with troubleshooting of document importing.
|
class |
DeleteTagger
Delete the metadata fields provided.
|
class |
DocumentLengthTagger
Adds the document length (i.e., number of bytes) to
the specified
field . |
class |
DOMTagger
Extract the value of one or more elements or attributes into
a target field, or delete matching elements.
|
class |
ExternalTagger
Extracts metadata from a document using an external application to do so.
|
class |
FieldReportTagger
A utility tagger that reports in a CSV file the fields discovered
in a crawl session, captured at the point of your choice in the
importing process.
|
class |
ForceSingleValueTagger
Forces a metadata field to be single-value.
|
class |
HierarchyTagger
Given a separator, split a field string into multiple segments
representing each node of a hierarchical branch.
|
class |
KeepOnlyTagger
Keep only the metadata fields provided, delete all other ones.
|
class |
LanguageTagger
Detects a document language based on Apache Tika language detection
capability.
|
class |
MergeTagger
Merge multiple metadata fields into a single one.
|
class |
RegexTagger
Extracts field names and their values with regular expression.
|
class |
RenameTagger
Rename metadata fields to different names.
|
class |
ReplaceTagger
Replaces an existing metadata value with another one.
|
class |
ScriptTagger
Tag incoming documents using a scripting language.
|
class |
SplitTagger
Splits an existing metadata value into multiple values based on a given
value separator (the separator gets discarded).
|
class |
TextBetweenTagger
Extracts and add values found between a matching start and
end strings to a document metadata field.
|
class |
TextPatternTagger
Deprecated.
Since 3.0.0, use
RegexTagger . |
class |
TextStatisticsTagger
Analyzes the content of the supplied document and adds statistical
information about its content or field as metadata fields.
|
class |
TitleGeneratorTagger
Attempts to generate a title from the document content (default) or
a specified metadata field.
|
class |
TruncateTagger
Truncates a
fromField value(s) and optionally replace truncated
portion by a hash value to help ensure uniqueness (not 100% guaranteed to
be collision-free). |
class |
URLExtractorTagger
Extracts unique URLs matching specific patterns in plain text content and
store them in a given field.
|
class |
UUIDTagger
Generates a random Universally unique identifier (UUID) and stores it
in the specified
field . |
Copyright © 2009–2023 Norconex Inc.. All rights reserved.