public class DateMetadataFilter extends AbstractDocumentFilter
Accepts or rejects a document based on whether field values correspond to a date matching supplied conditions and format. If multiple values are found for a field, only one of them needs to match for this filter to take effect. If the value cannot be parsed to a valid date, it is considered not to be matching (no exception is thrown).
To successfully parse a date, you can specify a date format,
as per the formatting options found on DateTimeFormatter
.
The default format when not specified is EPOCH (the difference, measured in
milliseconds, between the date and midnight, January 1, 1970).
When defining a filter condition, you can specify an absolute date (i.e. a constant date value) to be used for comparison. Supported formats for specifying a condition date are:
yyyy-MM-dd -> date (e.g. 2015-05-31) yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional milliseconds (e.g. 2015-05-31T22:44:15)
Filter conditions can also specify a moment in time relative to the
current date using the TODAY
or NOW
keyword,
optionally followed by a number of time units to add/remove.
TODAY
is the current day without the hours, minutes, and
seconds, where as NOW
is the current day with the hours,
minutes, and seconds. You can also decide whether you want the
current date to be fixed for life time of this filter (does not change
after being set for the first time), or whether
it should be refreshed on every invocation to reflect the passing of time.
When comparing dates at a more granular level (e.g., hours, minutes, seconds), it may be important to take time zones into account. If the time zone (id or offset) is part of a document field date value and this filter configured format supports time zones, it will be be interpreted as a date in the encountered time zone.
In cases where you want to overwrite the value existing time zone or
specify one for field dates without time zones, you can do so with
the setDocZoneId(ZoneId)
method.
Explicitly setting a time zone will not "convert" a date to that time zone,
but will rather assume it was created in the supplied time zone.
When using XML configuration to define the condition dates, you can
specify the time zone using the conditionZoneId
option.
<handler
class="com.norconex.importer.handler.filter.impl.DateMetadataFilter"
onMatch="[include|exclude]"
format="(document field date format)"
docZoneId="(force a time zone on evaluated fields.)"
conditionZoneId="(time zone of condition dates.)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(expression matching date fields to filter)
</fieldMatcher>
<!--
Use one or two (for ranges) conditions where:
Possible operators are:
gt -> greater than
ge -> greater equal
lt -> lower than
le -> lower equal
eq -> equals
Condition date value format are either one of:
yyyy-MM-dd -> date (e.g. 2015-05-31)
yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional
milliseconds (e.g. 2015-05-31T22:44:15)
TODAY[-+]9[YMDhms][*] -> the string "TODAY" (at 0:00:00) minus
or plus a number of years, months, days,
hours, minutes, or seconds
(e.g. 1 week ago: TODAY-7d).
* means TODAY can change from one
invocation to another to adjust to a
change of current day
NOW[-+]9[YMDhms][*] -> the string "NOW" (at current time)
minus
or plus a number of years, months, days,
hours, minutes, or seconds
(e.g. 1 week ago: NOW-7d).
* means NOW changes from one invocation
to another to adjust to the current
time.
-->
<condition
operator="[gt|ge|lt|le|eq]"
date="(a date)"/>
</handler>
<handler
class="DateMetadataFilter"
format="yyyy-MM-dd'T'HH:mm:ssZ"
conditionZoneId="America/New_York"
onMatch="include">
<fieldMatcher>publish_date</fieldMatcher>
<condition
operator="ge"
date="TODAY-7"/>
<condition
operator="lt"
date="TODAY"/>
</handler>
The above example will only keep documents from the last seven days, not including today.
Modifier and Type | Class and Description |
---|---|
static class |
DateMetadataFilter.Condition |
static class |
DateMetadataFilter.DynamicFixedDateTimeSupplier |
static class |
DateMetadataFilter.DynamicFloatingDateTimeSupplier |
static class |
DateMetadataFilter.Operator |
static class |
DateMetadataFilter.StaticDateTimeSupplier |
static class |
DateMetadataFilter.TimeUnit |
Constructor and Description |
---|
DateMetadataFilter() |
DateMetadataFilter(String field)
Deprecated.
Since 3.0.0, use
DateMetadataFilter(TextMatcher) |
DateMetadataFilter(String field,
OnMatch onMatch)
Deprecated.
Since 3.0.0, use
DateMetadataFilter(TextMatcher, OnMatch) |
DateMetadataFilter(TextMatcher fieldMatcher)
Constructor.
|
DateMetadataFilter(TextMatcher fieldMatcher,
OnMatch onMatch) |
Modifier and Type | Method and Description |
---|---|
void |
addCondition(DateMetadataFilter.Condition condition) |
void |
addCondition(DateMetadataFilter.Operator operator,
Date date)
Deprecated.
|
void |
addCondition(DateMetadataFilter.Operator operator,
Supplier<ZonedDateTime> dateTimeSupplier) |
void |
addCondition(DateMetadataFilter.Operator operator,
ZonedDateTime dateTime) |
void |
addConditions(List<DateMetadataFilter.Condition> conditions)
Adds a list of conditions, appending them to the list of already
defined conditions in this filter (if any).
|
boolean |
equals(Object other) |
List<DateMetadataFilter.Condition> |
getConditions()
Gets the list date filter conditions for this filter.
|
ZoneId |
getDocZoneId()
Gets the time zone id documents are considered to be.
|
String |
getField()
Deprecated.
Since 3.0.0, use
getFieldMatcher() . |
TextMatcher |
getFieldMatcher() |
String |
getFormat() |
int |
hashCode() |
protected boolean |
isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected void |
loadFilterFromXML(XML xml) |
void |
removeAllConditions()
Removes all conditions from this filter.
|
boolean |
removeCondition(DateMetadataFilter.Condition condition)
Removes a condition, if it part of already defined conditions.
|
protected void |
saveFilterToXML(XML xml) |
void |
setConditions(List<DateMetadataFilter.Condition> conditions)
Sets a list of conditions, overwriting any existing ones in this filter.
|
void |
setDocZoneId(ZoneId docZoneId)
Sets the time zone id documents are considered to be.
|
void |
setField(String field)
Deprecated.
Since 3.0.0, use
setFieldMatcher(TextMatcher) |
void |
setFieldMatcher(TextMatcher fieldMatcher) |
void |
setFormat(String format) |
static DateMetadataFilter.Condition |
toCondition(DateMetadataFilter.Operator operator,
String dateString,
ZoneId zoneId) |
String |
toString() |
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public DateMetadataFilter()
@Deprecated public DateMetadataFilter(String field)
DateMetadataFilter(TextMatcher)
field
- field to apply date filtering@Deprecated public DateMetadataFilter(String field, OnMatch onMatch)
DateMetadataFilter(TextMatcher, OnMatch)
field
- field to apply date filteringonMatch
- include or exclude on matchpublic DateMetadataFilter(TextMatcher fieldMatcher)
fieldMatcher
- matcher for fields on which to apply date filteringpublic DateMetadataFilter(TextMatcher fieldMatcher, OnMatch onMatch)
fieldMatcher
- matcher for fields on which to apply date filteringonMatch
- include or exclude on match@Deprecated public String getField()
getFieldMatcher()
.@Deprecated public void setField(String field)
setFieldMatcher(TextMatcher)
field
- field namepublic ZoneId getDocZoneId()
public void setDocZoneId(ZoneId docZoneId)
docZoneId
- zone idpublic TextMatcher getFieldMatcher()
public void setFieldMatcher(TextMatcher fieldMatcher)
public String getFormat()
public void setFormat(String format)
@Deprecated public void addCondition(DateMetadataFilter.Operator operator, Date date)
public void addCondition(DateMetadataFilter.Operator operator, ZonedDateTime dateTime)
public void addCondition(DateMetadataFilter.Operator operator, Supplier<ZonedDateTime> dateTimeSupplier)
public void addCondition(DateMetadataFilter.Condition condition)
public void addConditions(List<DateMetadataFilter.Condition> conditions)
conditions
- list of conditionspublic void setConditions(List<DateMetadataFilter.Condition> conditions)
conditions
- list of conditionspublic List<DateMetadataFilter.Condition> getConditions()
public boolean removeCondition(DateMetadataFilter.Condition condition)
condition
- the condition to removetrue
if the filter contained the conditionpublic void removeAllConditions()
protected boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
isDocumentMatched
in class AbstractDocumentFilter
ImporterHandlerException
protected void loadFilterFromXML(XML xml)
loadFilterFromXML
in class AbstractDocumentFilter
protected void saveFilterToXML(XML xml)
saveFilterToXML
in class AbstractDocumentFilter
public boolean equals(Object other)
equals
in class AbstractDocumentFilter
public int hashCode()
hashCode
in class AbstractDocumentFilter
public String toString()
toString
in class AbstractDocumentFilter
public static DateMetadataFilter.Condition toCondition(DateMetadataFilter.Operator operator, String dateString, ZoneId zoneId)
Copyright © 2009–2023 Norconex Inc.. All rights reserved.