Uses of Interface
com.norconex.importer.handler.IImporterHandler
-
-
Uses of IImporterHandler in com.norconex.importer
Methods in com.norconex.importer that return types with arguments of type IImporterHandler Modifier and Type Method Description List<IImporterHandler>
ImporterConfig. getPostParseHandlers()
Deprecated.Since 3.0.0, useImporterConfig.getPostParseConsumer()
insteadList<IImporterHandler>
ImporterConfig. getPreParseHandlers()
Deprecated.Since 3.0.0, useImporterConfig.getPreParseConsumer()
insteadMethod parameters in com.norconex.importer with type arguments of type IImporterHandler Modifier and Type Method Description void
ImporterConfig. setPostParseHandlers(List<IImporterHandler> postParseHandlers)
Deprecated.Since 3.0.0, useImporterConfig.setPostParseConsumer(Consumer)
insteadvoid
ImporterConfig. setPreParseHandlers(List<IImporterHandler> preParseHandlers)
Deprecated.Since 3.0.0, useImporterConfig.setPreParseConsumer(Consumer)
instead -
Uses of IImporterHandler in com.norconex.importer.handler
Methods in com.norconex.importer.handler that return IImporterHandler Modifier and Type Method Description IImporterHandler
HandlerConsumer. getHandler()
Methods in com.norconex.importer.handler with parameters of type IImporterHandler Modifier and Type Method Description static Consumer<HandlerContext>
HandlerConsumer. fromHandlers(IImporterHandler... importerHandlers)
void
HandlerConsumer. setHandler(IImporterHandler handler)
Method parameters in com.norconex.importer.handler with type arguments of type IImporterHandler Modifier and Type Method Description static Consumer<HandlerContext>
HandlerConsumer. fromHandlers(List<IImporterHandler> importerHandlers)
Constructors in com.norconex.importer.handler with parameters of type IImporterHandler Constructor Description HandlerConsumer(IImporterHandler handler)
-
Uses of IImporterHandler in com.norconex.importer.handler.filter
Subinterfaces of IImporterHandler in com.norconex.importer.handler.filter Modifier and Type Interface Description interface
IDocumentFilter
Filters documents.Classes in com.norconex.importer.handler.filter that implement IImporterHandler Modifier and Type Class Description class
AbstractCharStreamFilter
Base class for filters dealing with the body of text documents only.class
AbstractDocumentFilter
Base class for document filters.class
AbstractStringFilter
Base class to facilitate creating filters based on text content, loading text intoStringBuilder
for memory processing. -
Uses of IImporterHandler in com.norconex.importer.handler.filter.impl
Classes in com.norconex.importer.handler.filter.impl that implement IImporterHandler Modifier and Type Class Description class
DateMetadataFilter
Accepts or rejects a document based on whether field values correspond to a date matching supplied conditions and format.class
DOMContentFilter
Deprecated.Since 3.0.0, useDOMFilter
.class
DOMFilter
Uses a Document Object Model (DOM) representation of an HTML, XHTML, or XML document content to perform filtering based on matching an element/attribute or element/attribute value.class
EmptyFilter
Accepts or rejects a document based on whether its content (default) or any of the specified metadata fields are empty or not.class
EmptyMetadataFilter
Deprecated.Since 3.0.0, useEmptyFilter
.class
NumericMetadataFilter
Accepts or rejects a document based on the numeric value(s) of matching metadata fields, supporting decimals.class
ReferenceFilter
Accepts or rejects a document based on its reference (e.g.class
RegexContentFilter
Deprecated.Since 3.0.0, useTextFilter
instead.class
RegexMetadataFilter
Deprecated.Since 3.0.0, useTextFilter
instead.class
RegexReferenceFilter
Deprecated.Since 3.0.0, useReferenceFilter
instead.class
RejectFilter
Rejects a document.class
ScriptFilter
Filter incoming documents using a scripting language.class
TextFilter
Filters a document based on a text pattern in a document content (default), or matching fields specified. -
Uses of IImporterHandler in com.norconex.importer.handler.splitter
Subinterfaces of IImporterHandler in com.norconex.importer.handler.splitter Modifier and Type Interface Description interface
IDocumentSplitter
Responsible for splitting a single document into several ones.Classes in com.norconex.importer.handler.splitter that implement IImporterHandler Modifier and Type Class Description class
AbstractDocumentSplitter
Base class for splitters. -
Uses of IImporterHandler in com.norconex.importer.handler.splitter.impl
Classes in com.norconex.importer.handler.splitter.impl that implement IImporterHandler Modifier and Type Class Description class
CsvSplitter
Split files with Coma-Separated values (or any other characters, like tab) into one document per line.class
DOMSplitter
Splits HTML, XHTML, or XML document on elements matching a given selector.class
PDFPageSplitter
Split PDFs pages so each pages are treated as individual documents.class
TranslatorSplitter
Translate documents using one of the supported translation API.class
XMLStreamSplitter
Splits XML document on a specific element. -
Uses of IImporterHandler in com.norconex.importer.handler.tagger
Subinterfaces of IImporterHandler in com.norconex.importer.handler.tagger Modifier and Type Interface Description interface
IDocumentTagger
Tags a document with extra metadata information, or manipulate existing metadata information.Classes in com.norconex.importer.handler.tagger that implement IImporterHandler Modifier and Type Class Description class
AbstractCharStreamTagger
Base class for taggers dealing with the body of text documents only.class
AbstractDocumentTagger
Base class for taggers.class
AbstractStringTagger
Base class to facilitate creating taggers based on text content, loading text intoStringBuilder
for memory processing. -
Uses of IImporterHandler in com.norconex.importer.handler.tagger.impl
Classes in com.norconex.importer.handler.tagger.impl that implement IImporterHandler Modifier and Type Class Description class
CharacterCaseTagger
Changes the character case of matching fields and values according to one of the following methods:class
CharsetTagger
Converts one or more field values (if needed) from a source character encoding (charset) to a target one.class
ConstantTagger
Define and add constant values to documents.class
CopyTagger
Copies metadata fields.class
CountMatchesTagger
Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".class
CurrentDateTagger
Adds the current computer UTC date to the specifiedfield
.class
DateFormatTagger
Formats a date from any given format to a format of choice, as per the formatting options found onSimpleDateFormat
with the exception of the string "EPOCH" which represents the difference, measured in milliseconds, between the date and midnight, January 1, 1970.class
DebugTagger
A utility tagger to help with troubleshooting of document importing.class
DeleteTagger
Delete the metadata fields provided.class
DocumentLengthTagger
Adds the document length (i.e., number of bytes) to the specifiedfield
.class
DOMTagger
Extract the value of one or more elements or attributes into a target field, or delete matching elements.class
ExternalTagger
Extracts metadata from a document using an external application to do so.class
FieldReportTagger
A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process.class
ForceSingleValueTagger
Forces a metadata field to be single-value.class
HierarchyTagger
Given a separator, split a field string into multiple segments representing each node of a hierarchical branch.class
KeepOnlyTagger
Keep only the metadata fields provided, delete all other ones.class
LanguageTagger
Detects a document language based on Apache Tika language detection capability.class
MergeTagger
Merge multiple metadata fields into a single one.class
RegexTagger
Extracts field names and their values with regular expression.class
RenameTagger
Rename metadata fields to different names.class
ReplaceTagger
Replaces an existing metadata value with another one.class
ScriptTagger
Tag incoming documents using a scripting language.class
SplitTagger
Splits an existing metadata value into multiple values based on a given value separator (the separator gets discarded).class
TextBetweenTagger
Extracts and add values found between a matching start and end strings to a document metadata field.class
TextPatternTagger
Deprecated.Since 3.0.0, useRegexTagger
.class
TextStatisticsTagger
Analyzes the content of the supplied document and adds statistical information about its content or field as metadata fields.class
TitleGeneratorTagger
Attempts to generate a title from the document content (default) or a specified metadata field.class
TruncateTagger
Truncates afromField
value(s) and optionally replace truncated portion by a hash value to help ensure uniqueness (not 100% guaranteed to be collision-free).class
URLExtractorTagger
Extracts unique URLs matching specific patterns in plain text content and store them in a given field.class
UUIDTagger
Generates a random Universally unique identifier (UUID) and stores it in the specifiedfield
. -
Uses of IImporterHandler in com.norconex.importer.handler.transformer
Subinterfaces of IImporterHandler in com.norconex.importer.handler.transformer Modifier and Type Interface Description interface
IDocumentTransformer
Transformers allow to manipulate and modify a document metadata or content.Classes in com.norconex.importer.handler.transformer that implement IImporterHandler Modifier and Type Class Description class
AbstractCharStreamTransformer
Base class for transformers dealing with text documents only.class
AbstractDocumentTransformer
Base class for transformers.class
AbstractStringTransformer
Base class to facilitate creating transformers on text content, loading text into aStringBuilder
for memory processing. -
Uses of IImporterHandler in com.norconex.importer.handler.transformer.impl
Classes in com.norconex.importer.handler.transformer.impl that implement IImporterHandler Modifier and Type Class Description class
CharsetTransformer
Transforms a document content (if needed) from a source character encoding (charset) to a target one.class
DOMDeleteTransformer
Enables deletion of one or more elements matching a given selector from a document content.class
DOMPreserveTransformer
Preserves only one or more elements matching a given selector from a document content.class
ExternalTransformer
Transforms a document using an external application to do so.class
ImageTransformer
Transforms an image using common image operations.class
NoContentTransformer
Get rid of the content stream and optionally store it as text into a metadata field instead.class
ReduceConsecutivesTransformer
Reduces specified consecutive characters or strings to only one instance (document content only).class
ReplaceTransformer
Replaces every occurrences of the given replacements (document content only).class
ScriptTransformer
Transform incoming documents using a scripting language.class
StripAfterTransformer
Strips any content found after first match found for given pattern.class
StripBeforeTransformer
Strips any content found before first match found for given pattern.class
StripBetweenTransformer
Strips any content found between a matching start and end strings.class
SubstringTransformer
Keep a substring of the content matching a begin and end character indexes.
-