public class DateFormatTagger extends AbstractDocumentTagger
Formats a date from any given format to a format of choice, as per the
formatting options found on SimpleDateFormat
with the exception
of the string "EPOCH" which represents the difference, measured in
milliseconds, between the date and midnight, January 1, 1970.
The default format
for fromFormat
or toFormat
when not specified
is EPOCH.
When omitting the toField
, the value will replace the one
in the same field.
If the toField
already
exists, the newly formatted date will be added to the list of
existing values, unless "overwrite" is set to true
.
Can be used both as a pre-parse or post-parse handler.
Since 2.5.2, it is possible to specify a locale used for parsing and formatting dates. The locale is the ISO two-letter language code, with an optional ISO country code, separated with an underscore (e.g., "fr" for French, "fr_CA" for Canadian French). When no locale is specified, the default is "en_US" (US English).
Since 2.6.0, it is possible to specify multiple
fromFormat
values. Each formats will be tried in the order
provided and the first format that succeed in parsing a date will be used.
A date will be considered "bad" only if none of the formats could parse the
date.
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger" fromField="(from field)" toField="(to field)" fromLocale="(locale)" toLocale="(locale)" toFormat="(date format)" keepBadDates="(false|true)" overwrite="[false|true]" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <fromFormat>(date format)</fromFormat> <!-- multiple "fromFormat" tags allowed (only one needs to match) --> </tagger>
The following converts a date that is sometimes obtained from the HTTP header "Last-Modified" and sometimes is an EPOCH date, into an Apache Solr date format:
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger" fromField="Last-Modified" toField="solr_date" toFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" > <fromFormat>EEE, dd MMM yyyy HH:mm:ss zzz</fromFormat> <fromFormat>EPOCH</fromFormat> </tagger>
Constructor and Description |
---|
DateFormatTagger()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getFromField() |
String |
getFromFormat()
Deprecated.
Since 2.6.0, user
getFromFormats() |
String[] |
getFromFormats()
Gets the source date formats to match.
|
Locale |
getFromLocale()
Gets the locale used for parsing the source date.
|
String |
getToField() |
String |
getToFormat() |
Locale |
getToLocale()
Gets the locale used for formatting the target date.
|
int |
hashCode() |
boolean |
isKeepBadDates() |
boolean |
isOverwrite() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setFromField(String fromField) |
void |
setFromFormat(String fromFormat)
Deprecated.
Since 2.6.0, user
setFromFormats(String...) |
void |
setFromFormats(String... fromFormats)
Sets the source date formats to match.
|
void |
setFromLocale(Locale fromLocale)
Sets the locale used for parsing the source date.
|
void |
setKeepBadDates(boolean keepBadDates) |
void |
setOverwrite(boolean overwrite) |
void |
setToField(String toField) |
void |
setToFormat(String toFormat) |
void |
setToLocale(Locale toLocale)
Sets the locale used for formatting the source date.
|
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public String getFromField()
public void setFromField(String fromField)
public String getToField()
public void setToField(String toField)
@Deprecated public String getFromFormat()
getFromFormats()
@Deprecated public void setFromFormat(String fromFormat)
setFromFormats(String...)
fromFormat
- source date formatspublic String[] getFromFormats()
public void setFromFormats(String... fromFormats)
fromFormats
- source date formatspublic String getToFormat()
public void setToFormat(String toFormat)
public boolean isOverwrite()
public void setOverwrite(boolean overwrite)
public boolean isKeepBadDates()
public void setKeepBadDates(boolean keepBadDates)
public Locale getFromLocale()
public void setFromLocale(Locale fromLocale)
fromLocale
- localepublic Locale getToLocale()
public void setToLocale(Locale toLocale)
toLocale
- localeprotected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic String toString()
toString
in class AbstractImporterHandler
public boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.