Class DateFormatTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.DateFormatTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class DateFormatTagger extends AbstractDocumentTagger
Formats a date from any given format to a format of choice, as per the formatting options found on
SimpleDateFormat
with the exception of the string "EPOCH" which represents the difference, measured in milliseconds, between the date and midnight, January 1, 1970. The default format forfromFormat
ortoFormat
when not specified is EPOCH.When omitting the
toField
, the value will replace the one in the same field.Storing values in an existing field
If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior with
setOnSet(PropertySetter)
.Can be used both as a pre-parse or post-parse handler.
It is possible to specify a locale used for parsing and formatting dates. The locale is the ISO two-letter language code, with an optional ISO country code, separated with an underscore (e.g., "fr" for French, "fr_CA" for Canadian French). When no locale is specified, the default is "en_US" (US English).
Multiple
fromFormat
values can be specified. Each formats will be tried in the order provided and the first format that succeed in parsing a date will be used. A date will be considered "bad" only if none of the formats could parse the date.XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.DateFormatTagger" fromField="(from field)" toField="(to field)" fromLocale="(locale)" toLocale="(locale)" toFormat="(date format)" keepBadDates="(false|true)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple "fromFormat" tags allowed (only one needs to match) --> <fromFormat>(date format)</fromFormat> </handler>
XML usage example:
<handler class="DateFormatTagger" fromField="Last-Modified" toField="solr_date" toFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"> <fromFormat>EEE, dd MMM yyyy HH:mm:ss zzz</fromFormat> <fromFormat>EPOCH</fromFormat> </handler>
The following converts a date that is sometimes obtained from the HTTP header "Last-Modified" and sometimes is an EPOCH date, into an Apache Solr date format:
- Since:
- 2.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description DateFormatTagger()
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getFromField()
List<String>
getFromFormats()
Gets the source date formats to match.Locale
getFromLocale()
Gets the locale used for parsing the source date.PropertySetter
getOnSet()
Gets the property setter to use when a value is set.String
getToField()
String
getToFormat()
Locale
getToLocale()
Gets the locale used for formatting the target date.int
hashCode()
boolean
isKeepBadDates()
protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setFromField(String fromField)
void
setFromFormats(String... fromFormats)
Sets the source date formats to match.void
setFromFormats(List<String> fromFormats)
Sets the source date formats to match.void
setFromLocale(Locale fromLocale)
Sets the locale used for parsing the source date.void
setKeepBadDates(boolean keepBadDates)
void
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.void
setToField(String toField)
void
setToFormat(String toFormat)
void
setToLocale(Locale toLocale)
Sets the locale used for formatting the source date.void
tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocument
in classAbstractDocumentTagger
- Throws:
ImporterHandlerException
-
getFromField
public String getFromField()
-
setFromField
public void setFromField(String fromField)
-
getToField
public String getToField()
-
setToField
public void setToField(String toField)
-
getFromFormats
public List<String> getFromFormats()
Gets the source date formats to match.- Returns:
- source date formats
- Since:
- 2.6.0
-
setFromFormats
public void setFromFormats(String... fromFormats)
Sets the source date formats to match.- Parameters:
fromFormats
- source date formats- Since:
- 2.6.0
-
setFromFormats
public void setFromFormats(List<String> fromFormats)
Sets the source date formats to match.- Parameters:
fromFormats
- source date formats- Since:
- 3.0.0
-
getToFormat
public String getToFormat()
-
setToFormat
public void setToFormat(String toFormat)
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.- Parameters:
onSet
- property setter- Since:
- 3.0.0
-
isKeepBadDates
public boolean isKeepBadDates()
-
setKeepBadDates
public void setKeepBadDates(boolean keepBadDates)
-
getFromLocale
public Locale getFromLocale()
Gets the locale used for parsing the source date.- Returns:
- locale
- Since:
- 2.5.2
-
setFromLocale
public void setFromLocale(Locale fromLocale)
Sets the locale used for parsing the source date.- Parameters:
fromLocale
- locale- Since:
- 2.5.2
-
getToLocale
public Locale getToLocale()
Gets the locale used for formatting the target date.- Returns:
- locale
- Since:
- 2.5.2
-
setToLocale
public void setToLocale(Locale toLocale)
Sets the locale used for formatting the source date.- Parameters:
toLocale
- locale- Since:
- 2.5.2
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-