Class DateMetadataFilter

  • All Implemented Interfaces:
    IXMLConfigurable, IDocumentFilter, IOnMatchFilter, IImporterHandler

    public class DateMetadataFilter
    extends AbstractDocumentFilter

    Accepts or rejects a document based on whether field values correspond to a date matching supplied conditions and format. If multiple values are found for a field, only one of them needs to match for this filter to take effect. If the value cannot be parsed to a valid date, it is considered not to be matching (no exception is thrown).

    Metadata date field format:

    To successfully parse a date, you can specify a date format, as per the formatting options found on DateTimeFormatter. The default format when not specified is EPOCH (the difference, measured in milliseconds, between the date and midnight, January 1, 1970).

    Absolute date conditions:

    When defining a filter condition, you can specify an absolute date (i.e. a constant date value) to be used for comparison. Supported formats for specifying a condition date are:

       yyyy-MM-dd                -> date (e.g. 2015-05-31)
       yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional
                                    milliseconds (e.g. 2015-05-31T22:44:15)
     

    Relative date conditions:

    Filter conditions can also specify a moment in time relative to the current date using the TODAY or NOW keyword, optionally followed by a number of time units to add/remove. TODAY is the current day without the hours, minutes, and seconds, where as NOW is the current day with the hours, minutes, and seconds. You can also decide whether you want the current date to be fixed for life time of this filter (does not change after being set for the first time), or whether it should be refreshed on every invocation to reflect the passing of time.

    Time zones:

    When comparing dates at a more granular level (e.g., hours, minutes, seconds), it may be important to take time zones into account. If the time zone (id or offset) is part of a document field date value and this filter configured format supports time zones, it will be be interpreted as a date in the encountered time zone.

    In cases where you want to overwrite the value existing time zone or specify one for field dates without time zones, you can do so with the setDocZoneId(ZoneId) method. Explicitly setting a time zone will not "convert" a date to that time zone, but will rather assume it was created in the supplied time zone.

    When using XML configuration to define the condition dates, you can specify the time zone using the conditionZoneId option.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.filter.impl.DateMetadataFilter"
        onMatch="[include|exclude]"
        format="(document field date format)"
        docZoneId="(force a time zone on evaluated fields.)"
        conditionZoneId="(time zone of condition dates.)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <fieldMatcher>(expression matching date fields to filter)</fieldMatcher>
      <!--
        Use one or two (for ranges) conditions where:
    
               Possible operators are:
    
                 gt -> greater than
                 ge -> greater equal
                 lt -> lower than
                 le -> lower equal
                 eq -> equals
    
               Condition date value format are either one of:
    
                 yyyy-MM-dd                -> date (e.g. 2015-05-31)
                 yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional
                                              milliseconds (e.g. 2015-05-31T22:44:15)
                 TODAY[-+]9[YMDhms][*]     -> the string "TODAY" (at 0:00:00) minus
    
                                              or plus a number of years, months, days,
    
                                              hours, minutes, or seconds
                                              (e.g. 1 week ago: TODAY-7d).
                                              * means TODAY can change from one
                                              invocation to another to adjust to a
                                              change of current day
                 NOW[-+]9[YMDhms][*]       -> the string "NOW" (at current time)
                 minus
    
                                              or plus a number of years, months, days,
    
                                              hours, minutes, or seconds
                                              (e.g. 1 week ago: NOW-7d).
                                              * means NOW changes from one invocation
                                              to another to adjust to the current
                                              time.
        -->
      <condition
          operator="[gt|ge|lt|le|eq]"
          date="(a date)"/>
    </handler>

    XML usage example:

    
    <handler
        class="DateMetadataFilter"
        format="yyyy-MM-dd'T'HH:mm:ssZ"
        conditionZoneId="America/New_York"
        onMatch="include">
      <fieldMatcher>publish_date</fieldMatcher>
      <condition
          operator="ge"
          date="TODAY-7"/>
      <condition
          operator="lt"
          date="TODAY"/>
    </handler>

    The above example will only keep documents from the last seven days, not including today.

    Since:
    2.2.0
    Author:
    Pascal Essiembre