Class DateFormatTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class DateFormatTagger
    extends AbstractDocumentTagger

    Formats a date from any given format to a format of choice, as per the formatting options found on SimpleDateFormat with the exception of the string "EPOCH" which represents the difference, measured in milliseconds, between the date and midnight, January 1, 1970. The default format for fromFormat or toFormat when not specified is EPOCH.

    When omitting the toField, the value will replace the one in the same field.

    Storing values in an existing field

    If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior with setOnSet(PropertySetter).

    Can be used both as a pre-parse or post-parse handler.

    It is possible to specify a locale used for parsing and formatting dates. The locale is the ISO two-letter language code, with an optional ISO country code, separated with an underscore (e.g., "fr" for French, "fr_CA" for Canadian French). When no locale is specified, the default is "en_US" (US English).

    Multiple fromFormat values can be specified. Each formats will be tried in the order provided and the first format that succeed in parsing a date will be used. A date will be considered "bad" only if none of the formats could parse the date.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
        fromField="(from field)"
        toField="(to field)"
        fromLocale="(locale)"
        toLocale="(locale)"
        toFormat="(date format)"
        keepBadDates="(false|true)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <!-- multiple "fromFormat" tags allowed (only one needs to match) -->
      <fromFormat>(date format)</fromFormat>
    </handler>

    XML usage example:

    
    <handler
        class="DateFormatTagger"
        fromField="Last-Modified"
        toField="solr_date"
        toFormat="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'">
      <fromFormat>EEE, dd MMM yyyy HH:mm:ss zzz</fromFormat>
      <fromFormat>EPOCH</fromFormat>
    </handler>

    The following converts a date that is sometimes obtained from the HTTP header "Last-Modified" and sometimes is an EPOCH date, into an Apache Solr date format:

    Since:
    2.0.0
    Author:
    Pascal Essiembre
    • Constructor Detail

      • DateFormatTagger

        public DateFormatTagger()
        Constructor.
    • Method Detail

      • getFromField

        public String getFromField()
      • setFromField

        public void setFromField​(String fromField)
      • getToField

        public String getToField()
      • setToField

        public void setToField​(String toField)
      • getFromFormats

        public List<String> getFromFormats()
        Gets the source date formats to match.
        Returns:
        source date formats
        Since:
        2.6.0
      • setFromFormats

        public void setFromFormats​(String... fromFormats)
        Sets the source date formats to match.
        Parameters:
        fromFormats - source date formats
        Since:
        2.6.0
      • setFromFormats

        public void setFromFormats​(List<String> fromFormats)
        Sets the source date formats to match.
        Parameters:
        fromFormats - source date formats
        Since:
        3.0.0
      • getToFormat

        public String getToFormat()
      • setToFormat

        public void setToFormat​(String toFormat)
      • getOnSet

        public PropertySetter getOnSet()
        Gets the property setter to use when a value is set.
        Returns:
        property setter
        Since:
        3.0.0
      • setOnSet

        public void setOnSet​(PropertySetter onSet)
        Sets the property setter to use when a value is set.
        Parameters:
        onSet - property setter
        Since:
        3.0.0
      • isKeepBadDates

        public boolean isKeepBadDates()
      • setKeepBadDates

        public void setKeepBadDates​(boolean keepBadDates)
      • getFromLocale

        public Locale getFromLocale()
        Gets the locale used for parsing the source date.
        Returns:
        locale
        Since:
        2.5.2
      • setFromLocale

        public void setFromLocale​(Locale fromLocale)
        Sets the locale used for parsing the source date.
        Parameters:
        fromLocale - locale
        Since:
        2.5.2
      • getToLocale

        public Locale getToLocale()
        Gets the locale used for formatting the target date.
        Returns:
        locale
        Since:
        2.5.2
      • setToLocale

        public void setToLocale​(Locale toLocale)
        Sets the locale used for formatting the source date.
        Parameters:
        toLocale - locale
        Since:
        2.5.2