public class CountMatchesTagger extends AbstractCharStreamTagger
Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".
If no "fieldMatcher" expression is specified, the document content will be used. If the "fieldMatcher" matches more than one field, the sum of all matches will be stored as a single value. More often than not, you probably want to set your "countMatcher" to "partial".
If a target field with the same name already exists for a document,
the count value will be added to the end of the existing value list.
It is possible to change this default behavior
with setOnSet(PropertySetter)
.
Can be used as a pre-parse tagger on text document only when matching strings on document content, or both as a pre-parse or post-parse handler when the "fieldMatcher" is used.
<handler
class="com.norconex.importer.handler.tagger.impl.CountMatchesTagger"
toField="(target field)"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(optional expression for fields used to count matches)
</fieldMatcher>
<countMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(expression used to count matches)
</countMatcher>
</handler>
<handler
class="CountMatchesTagger"
toField="urlSegmentCount">
<fieldMatcher>document.reference</fieldMatcher>
<countMatcher
method="regex">
/[^/]+
</countMatcher>
</handler>
The above will count the number of segments in a URL.
Pattern
Modifier and Type | Class and Description |
---|---|
static class |
CountMatchesTagger.MatchDetails
Deprecated.
|
Constructor and Description |
---|
CountMatchesTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.
Since 3.0.0, use
setToField(String) ,
setFieldMatcher(TextMatcher) ,
and setCountMatcher(TextMatcher) . |
boolean |
equals(Object other) |
TextMatcher |
getCountMatcher()
Gets the count matcher.
|
TextMatcher |
getFieldMatcher()
Gets the field matcher.
|
List<CountMatchesTagger.MatchDetails> |
getMatchesDetails()
Deprecated.
Since 3.0.0, use
getToField() ,
getFieldMatcher() , and getCountMatcher() . |
int |
getMaxReadSize()
Gets the maximum number of characters to read from content for tagging
at once.
|
PropertySetter |
getOnSet()
Gets the property setter to use when a value is set.
|
String |
getToField()
Sets the target field.
|
int |
hashCode() |
protected void |
loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
void |
removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.
Since 3.0.0, this method does nothing.
|
protected void |
saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setCountMatcher(TextMatcher countMatcher)
Sets the count matcher.
|
void |
setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher.
|
void |
setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read from content for tagging
at once.
|
void |
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.
|
void |
setToField(String toField)
Gets the target field.
|
protected void |
tagTextDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
String |
toString() |
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
tagTextDocument
in class AbstractCharStreamTagger
ImporterHandlerException
public int getMaxReadSize()
TextReader.DEFAULT_MAX_READ_SIZE
.public void setMaxReadSize(int maxReadSize)
maxReadSize
- maximum read sizepublic TextMatcher getFieldMatcher()
public void setFieldMatcher(TextMatcher fieldMatcher)
fieldMatcher
- field matcherpublic TextMatcher getCountMatcher()
public void setCountMatcher(TextMatcher countMatcher)
countMatcher
- count matcherpublic String getToField()
public void setToField(String toField)
toField
- target fieldpublic PropertySetter getOnSet()
public void setOnSet(PropertySetter onSet)
onSet
- property setter@Deprecated public List<CountMatchesTagger.MatchDetails> getMatchesDetails()
getToField()
,
getFieldMatcher()
, and getCountMatcher()
.@Deprecated public void removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
matchDetails
- match details@Deprecated public void addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
setToField(String)
,
setFieldMatcher(TextMatcher)
,
and setCountMatcher(TextMatcher)
.matchDetails
- the match detailsprotected void loadCharStreamTaggerFromXML(XML xml)
AbstractCharStreamTagger
loadCharStreamTaggerFromXML
in class AbstractCharStreamTagger
xml
- xml configurationprotected void saveCharStreamTaggerToXML(XML xml)
AbstractCharStreamTagger
saveCharStreamTaggerToXML
in class AbstractCharStreamTagger
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractCharStreamTagger
public int hashCode()
hashCode
in class AbstractCharStreamTagger
public String toString()
toString
in class AbstractCharStreamTagger
Copyright © 2009–2023 Norconex Inc.. All rights reserved.