Updated |
Updated transitive dependencies with known vulnerabilities.
|
|
Updated |
Updated dependencies to avoid logging library detection conflict.
|
|
Updated |
Maven dependency updates: Apache Tika 1.27 (and its many transitive
dependencies), UCAR jj2000 5.4, Opencsv 5.5.2,
JAI Image-IO jpeg2000 1.4.0, JBIG2 ImageIO 2.0.
|
|
Fixed |
Fixed invalid configuration in POM "maven-dependency-plugin".
|
|
New |
Handlers now support XML "flow", which adds supports for
if/ifNot/condition/then/else tags in XML configuration.
|
|
New |
New "condition" classes for XML "flow" configuration: BlankCondition,
DateCondition, DOMCondition, NumericCondition, ReferenceCondition,
ScriptCondition, and TextCondition.
|
|
New |
New RejectFilter.
|
|
New |
New CharsetUtil#firstNonBlankOrUTF8(...) methods.
|
|
New |
When not already set, an attempt to detect document character encoding
is now always made before invoking handlers.
|
|
New |
New CommonMatchers class.
|
|
New |
New ImageTransformer class.
|
|
New |
New NoContentTransformer class.
|
|
New |
New -f or "outputMetaFormat" command-line argument for saving
exported metadata fields in alternate formats.
|
|
New |
New TextFilter class.
|
|
New |
New ReferenceFilter class.
|
|
New |
New ExternalHandler class.
|
|
New |
New DOMFilter class.
|
|
New |
New EmptyFilter class.
|
|
New |
New RegexTagger class.
|
|
New |
New URLExtractorTagger class.
|
|
New |
New DOMDeleteTransformer class.
|
|
New |
New XMLStreamSplitter class.
|
|
New |
New HandlerDoc to ease handler implementations.
|
|
New |
Importer now uses an EventManager and triggers several events:
IMPORTER_HANDLER_BEGIN, IMPORTER_HANDLER_END, IMPORTER_HANDLER_ERROR,
IMPORTER_PARSER_BEGIN, IMPORTER_PARSER_END, IMPORTER_PARSER_ERROR
|
|
New |
New ImporterDocument#getStreamFactory() method.
|
|
New |
ReplaceTagger now has the option to discard values that are unchanged
after replacement.
|
|
New |
New options on CharacterCaseTagger: "wordsFully", "stringFully",
"sentences", and "sentencesFully".
|
|
New |
Most configurable classes adding/setting metadata values now have
an extra "onSet" option for dictating how values are set:
append, prepend, replace, optional.
|
|
New |
New DocInfo class.
|
|
New |
New ImporterRequest class.
|
|
New |
New option in DOMTagger to delete elements matched by a selector.
|
|
New |
Added time zone support to DateMetadataFilter.
|
|
New |
Added support for Webp image format.
|
|
Updated |
Now requires Java 8 or higher.
|
|
Updated |
Importer#importDocument(...) now expects an ImporterRequest or a Doc.
|
|
Updated |
Default allocated memory for caching of document content was increased
by a factor of 10 (100MB max per document, 1GB max total).
|
|
Updated |
XML configuration of handlers had their XML tag names changed from
"filter", "tagger", "transformer, "splitter" to simply "handler".
|
|
Updated |
JBIG2 image support now included under apache license.
|
|
Updated |
Logging now using SLF4J.
|
|
Updated |
Maven dependency updates: Norconex Commons Lang 2.0.0,
Apache Tika 1.22, Apache Commons CLI 1.4, Junit 5.
|
|
Updated |
RegexFieldExtractor and RegexUtil have been deprecated in favor
of Norconex Commons Lang FieldValueExtractor and Regex.
|
|
Updated |
RegexContentFilter and RegexMetadataFilter have been deprecated in
favor of TextFilter.
|
|
Updated |
RegexReferenceFilter has been deprecated in favor of ReferenceFilter.
|
|
Updated |
DOMContentFilter has been deprecated in favor of DOMFilter.
|
|
Updated |
EmptyMetadataFilter has been deprecated in favor of EmptyFilter.
|
|
Updated |
TextPatternTagger has been deprecated in favor of RegexTagger.
|
|
Updated |
TextBetweenTagger now has "inclusive" and "caseSensitive" options
configurable for each "between" details.
|
|
Updated |
Now using Path instead of File in many cases.
|
|
Updated |
Parsing no longer attempted on zero-length content.
|
|
Updated |
List of PropertyMatcher replaced with PropertyMatchers.
|
|
Updated |
ContentTypeDetector methods are now static.
|
|
Updated |
Eliminated Apache Tika log warnings on startup when missing specific
optional libraries not package due to licensing
(e.g. JPEG 2000, jbig2).
|
|
Updated |
Occurrences of accessors for overwrite="[false|true]" and
onConflict="..." have been deprecated in favor of
new onSet="...".
|
|
Updated |
Most places where regular expressions could be used now also
support "basic" matching and "wildcard" as well as being able to
ignore diacritical marks (e.g., accents).
|
|
Updated |
Most occurrences of "caseSensitive" or "caseInsensitive" configuration
options are now replaced with "ignoreCase".
|
|
Updated |
Filters implementing AbstractStringFilter will now have their
isStringContentMatching(...) method invoked at least once, even
if there is no document content.
|
|
Updated |
"parsed" boolean arguments were replaced by ParseState.PRE and
ParseState.POST.
|
|
Updated |
Many methods with a combinations of reference, input stream, and
metadata were updated to now accept a Doc instance instead.
|
|
Removed |
Removed some of the methods deprecated in previous releases.
|
|
Removed |
Removed SplittableDocument.
|
|