public class DateFormatTagger extends AbstractDocumentTagger
Formats a date from any given format to a format of choice, as per the
formatting options found on SimpleDateFormat
with the exception
of the string "EPOCH" which represents the difference, measured in
milliseconds, between the date and midnight, January 1, 1970.
The default format
for fromFormat
or toFormat
when not specified
is EPOCH.
When omitting the toField
, the value will replace the one
in the same field.
If a target field with the same name already exists for a document,
values will be added to the end of the existing value list.
It is possible to change this default behavior
with setOnSet(PropertySetter)
.
Can be used both as a pre-parse or post-parse handler.
It is possible to specify a locale used for parsing and formatting dates. The locale is the ISO two-letter language code, with an optional ISO country code, separated with an underscore (e.g., "fr" for French, "fr_CA" for Canadian French). When no locale is specified, the default is "en_US" (US English).
Multiple fromFormat
values can be specified. Each formats will
be tried in the order provided and the first format that succeed in
parsing a date will be used.
A date will be considered "bad" only if none of the formats could parse the
date.
<handler
class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
fromField="(from field)"
toField="(to field)"
fromLocale="(locale)"
toLocale="(locale)"
toFormat="(date format)"
keepBadDates="(false|true)"
onSet="[append|prepend|replace|optional]">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<!-- multiple "fromFormat" tags allowed (only one needs to match) -->
<fromFormat>(date format)</fromFormat>
</handler>
<handler
class="DateFormatTagger"
fromField="Last-Modified"
toField="solr_date"
toFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'">
<fromFormat>EEE, dd MMM yyyy HH:mm:ss zzz</fromFormat>
<fromFormat>EPOCH</fromFormat>
</handler>
The following converts a date that is sometimes obtained from the HTTP header "Last-Modified" and sometimes is an EPOCH date, into an Apache Solr date format:
Constructor and Description |
---|
DateFormatTagger()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getFromField() |
List<String> |
getFromFormats()
Gets the source date formats to match.
|
Locale |
getFromLocale()
Gets the locale used for parsing the source date.
|
PropertySetter |
getOnSet()
Gets the property setter to use when a value is set.
|
String |
getToField() |
String |
getToFormat() |
Locale |
getToLocale()
Gets the locale used for formatting the target date.
|
int |
hashCode() |
boolean |
isKeepBadDates() |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setFromField(String fromField) |
void |
setFromFormats(List<String> fromFormats)
Sets the source date formats to match.
|
void |
setFromFormats(String... fromFormats)
Sets the source date formats to match.
|
void |
setFromLocale(Locale fromLocale)
Sets the locale used for parsing the source date.
|
void |
setKeepBadDates(boolean keepBadDates) |
void |
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.
|
void |
setToField(String toField) |
void |
setToFormat(String toFormat) |
void |
setToLocale(Locale toLocale)
Sets the locale used for formatting the source date.
|
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public String getFromField()
public void setFromField(String fromField)
public String getToField()
public void setToField(String toField)
public List<String> getFromFormats()
public void setFromFormats(String... fromFormats)
fromFormats
- source date formatspublic void setFromFormats(List<String> fromFormats)
fromFormats
- source date formatspublic String getToFormat()
public void setToFormat(String toFormat)
public PropertySetter getOnSet()
public void setOnSet(PropertySetter onSet)
onSet
- property setterpublic boolean isKeepBadDates()
public void setKeepBadDates(boolean keepBadDates)
public Locale getFromLocale()
public void setFromLocale(Locale fromLocale)
fromLocale
- localepublic Locale getToLocale()
public void setToLocale(Locale toLocale)
toLocale
- localeprotected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.