Class DateMetadataFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- com.norconex.importer.handler.filter.impl.DateMetadataFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
public class DateMetadataFilter extends AbstractDocumentFilter
Accepts or rejects a document based on whether field values correspond to a date matching supplied conditions and format. If multiple values are found for a field, only one of them needs to match for this filter to take effect. If the value cannot be parsed to a valid date, it is considered not to be matching (no exception is thrown).
Metadata date field format:
To successfully parse a date, you can specify a date format, as per the formatting options found on
DateTimeFormatter
. The default format when not specified is EPOCH (the difference, measured in milliseconds, between the date and midnight, January 1, 1970).Absolute date conditions:
When defining a filter condition, you can specify an absolute date (i.e. a constant date value) to be used for comparison. Supported formats for specifying a condition date are:
yyyy-MM-dd -> date (e.g. 2015-05-31) yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional milliseconds (e.g. 2015-05-31T22:44:15)
Relative date conditions:
Filter conditions can also specify a moment in time relative to the current date using the
TODAY
orNOW
keyword, optionally followed by a number of time units to add/remove.TODAY
is the current day without the hours, minutes, and seconds, where asNOW
is the current day with the hours, minutes, and seconds. You can also decide whether you want the current date to be fixed for life time of this filter (does not change after being set for the first time), or whether it should be refreshed on every invocation to reflect the passing of time.Time zones:
When comparing dates at a more granular level (e.g., hours, minutes, seconds), it may be important to take time zones into account. If the time zone (id or offset) is part of a document field date value and this filter configured format supports time zones, it will be be interpreted as a date in the encountered time zone.
In cases where you want to overwrite the value existing time zone or specify one for field dates without time zones, you can do so with the
setDocZoneId(ZoneId)
method. Explicitly setting a time zone will not "convert" a date to that time zone, but will rather assume it was created in the supplied time zone.When using XML configuration to define the condition dates, you can specify the time zone using the
conditionZoneId
option.XML configuration usage:
<handler class="com.norconex.importer.handler.filter.impl.DateMetadataFilter" onMatch="[include|exclude]" format="(document field date format)" docZoneId="(force a time zone on evaluated fields.)" conditionZoneId="(time zone of condition dates.)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher>(expression matching date fields to filter)</fieldMatcher> <!-- Use one or two (for ranges) conditions where: Possible operators are: gt -> greater than ge -> greater equal lt -> lower than le -> lower equal eq -> equals Condition date value format are either one of: yyyy-MM-dd -> date (e.g. 2015-05-31) yyyy-MM-ddThh:mm:ss[.SSS] -> date and time with optional milliseconds (e.g. 2015-05-31T22:44:15) TODAY[-+]9[YMDhms][*] -> the string "TODAY" (at 0:00:00) minus or plus a number of years, months, days, hours, minutes, or seconds (e.g. 1 week ago: TODAY-7d). * means TODAY can change from one invocation to another to adjust to a change of current day NOW[-+]9[YMDhms][*] -> the string "NOW" (at current time) minus or plus a number of years, months, days, hours, minutes, or seconds (e.g. 1 week ago: NOW-7d). * means NOW changes from one invocation to another to adjust to the current time. --> <condition operator="[gt|ge|lt|le|eq]" date="(a date)"/> </handler>
XML usage example:
<handler class="DateMetadataFilter" format="yyyy-MM-dd'T'HH:mm:ssZ" conditionZoneId="America/New_York" onMatch="include"> <fieldMatcher>publish_date</fieldMatcher> <condition operator="ge" date="TODAY-7"/> <condition operator="lt" date="TODAY"/> </handler>
The above example will only keep documents from the last seven days, not including today.
- Since:
- 2.2.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DateMetadataFilter.Condition
static class
DateMetadataFilter.DynamicFixedDateTimeSupplier
static class
DateMetadataFilter.DynamicFloatingDateTimeSupplier
static class
DateMetadataFilter.Operator
static class
DateMetadataFilter.StaticDateTimeSupplier
static class
DateMetadataFilter.TimeUnit
-
Constructor Summary
Constructors Constructor Description DateMetadataFilter()
DateMetadataFilter(TextMatcher fieldMatcher)
Constructor.DateMetadataFilter(TextMatcher fieldMatcher, OnMatch onMatch)
DateMetadataFilter(String field)
Deprecated.Since 3.0.0, useDateMetadataFilter(TextMatcher)
DateMetadataFilter(String field, OnMatch onMatch)
Deprecated.Since 3.0.0, useDateMetadataFilter(TextMatcher, OnMatch)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addCondition(DateMetadataFilter.Condition condition)
void
addCondition(DateMetadataFilter.Operator operator, ZonedDateTime dateTime)
void
addCondition(DateMetadataFilter.Operator operator, Date date)
Deprecated.void
addCondition(DateMetadataFilter.Operator operator, Supplier<ZonedDateTime> dateTimeSupplier)
void
addConditions(List<DateMetadataFilter.Condition> conditions)
Adds a list of conditions, appending them to the list of already defined conditions in this filter (if any).boolean
equals(Object other)
List<DateMetadataFilter.Condition>
getConditions()
Gets the list date filter conditions for this filter.ZoneId
getDocZoneId()
Gets the time zone id documents are considered to be.String
getField()
Deprecated.Since 3.0.0, usegetFieldMatcher()
.TextMatcher
getFieldMatcher()
String
getFormat()
int
hashCode()
protected boolean
isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState)
protected void
loadFilterFromXML(XML xml)
void
removeAllConditions()
Removes all conditions from this filter.boolean
removeCondition(DateMetadataFilter.Condition condition)
Removes a condition, if it part of already defined conditions.protected void
saveFilterToXML(XML xml)
void
setConditions(List<DateMetadataFilter.Condition> conditions)
Sets a list of conditions, overwriting any existing ones in this filter.void
setDocZoneId(ZoneId docZoneId)
Sets the time zone id documents are considered to be.void
setField(String field)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)
void
setFieldMatcher(TextMatcher fieldMatcher)
void
setFormat(String format)
static DateMetadataFilter.Condition
toCondition(DateMetadataFilter.Operator operator, String dateString, ZoneId zoneId)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Constructor Detail
-
DateMetadataFilter
public DateMetadataFilter()
-
DateMetadataFilter
@Deprecated public DateMetadataFilter(String field)
Deprecated.Since 3.0.0, useDateMetadataFilter(TextMatcher)
Constructor.- Parameters:
field
- field to apply date filtering
-
DateMetadataFilter
@Deprecated public DateMetadataFilter(String field, OnMatch onMatch)
Deprecated.Since 3.0.0, useDateMetadataFilter(TextMatcher, OnMatch)
Constructor.- Parameters:
field
- field to apply date filteringonMatch
- include or exclude on match
-
DateMetadataFilter
public DateMetadataFilter(TextMatcher fieldMatcher)
Constructor.- Parameters:
fieldMatcher
- matcher for fields on which to apply date filtering- Since:
- 3.0.0
-
DateMetadataFilter
public DateMetadataFilter(TextMatcher fieldMatcher, OnMatch onMatch)
- Parameters:
fieldMatcher
- matcher for fields on which to apply date filteringonMatch
- include or exclude on match- Since:
- 3.0.0
-
-
Method Detail
-
getField
@Deprecated public String getField()
Deprecated.Since 3.0.0, usegetFieldMatcher()
.Deprecated.- Returns:
- field name
-
setField
@Deprecated public void setField(String field)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)
Deprecated.- Parameters:
field
- field name
-
getDocZoneId
public ZoneId getDocZoneId()
Gets the time zone id documents are considered to be.- Returns:
- zone id
- Since:
- 3.0.0
-
setDocZoneId
public void setDocZoneId(ZoneId docZoneId)
Sets the time zone id documents are considered to be.- Parameters:
docZoneId
- zone id- Since:
- 3.0.0
-
getFieldMatcher
public TextMatcher getFieldMatcher()
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
-
getFormat
public String getFormat()
-
setFormat
public void setFormat(String format)
-
addCondition
@Deprecated public void addCondition(DateMetadataFilter.Operator operator, Date date)
Deprecated.
-
addCondition
public void addCondition(DateMetadataFilter.Operator operator, ZonedDateTime dateTime)
-
addCondition
public void addCondition(DateMetadataFilter.Operator operator, Supplier<ZonedDateTime> dateTimeSupplier)
-
addCondition
public void addCondition(DateMetadataFilter.Condition condition)
-
addConditions
public void addConditions(List<DateMetadataFilter.Condition> conditions)
Adds a list of conditions, appending them to the list of already defined conditions in this filter (if any).- Parameters:
conditions
- list of conditions- Since:
- 3.0.0
-
setConditions
public void setConditions(List<DateMetadataFilter.Condition> conditions)
Sets a list of conditions, overwriting any existing ones in this filter.- Parameters:
conditions
- list of conditions- Since:
- 3.0.0
-
getConditions
public List<DateMetadataFilter.Condition> getConditions()
Gets the list date filter conditions for this filter.- Returns:
- conditions
- Since:
- 3.0.0
-
removeCondition
public boolean removeCondition(DateMetadataFilter.Condition condition)
Removes a condition, if it part of already defined conditions.- Parameters:
condition
- the condition to remove- Returns:
true
if the filter contained the condition- Since:
- 3.0.0
-
removeAllConditions
public void removeAllConditions()
Removes all conditions from this filter.- Since:
- 3.0.0
-
isDocumentMatched
protected boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Specified by:
isDocumentMatched
in classAbstractDocumentFilter
- Throws:
ImporterHandlerException
-
loadFilterFromXML
protected void loadFilterFromXML(XML xml)
- Specified by:
loadFilterFromXML
in classAbstractDocumentFilter
-
saveFilterToXML
protected void saveFilterToXML(XML xml)
- Specified by:
saveFilterToXML
in classAbstractDocumentFilter
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractDocumentFilter
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractDocumentFilter
-
toString
public String toString()
- Overrides:
toString
in classAbstractDocumentFilter
-
toCondition
public static DateMetadataFilter.Condition toCondition(DateMetadataFilter.Operator operator, String dateString, ZoneId zoneId)
-
-