Class CountMatchesTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- com.norconex.importer.handler.tagger.impl.CountMatchesTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class CountMatchesTagger extends AbstractCharStreamTagger
Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".
If no "fieldMatcher" expression is specified, the document content will be used. If the "fieldMatcher" matches more than one field, the sum of all matches will be stored as a single value. More often than not, you probably want to set your "countMatcher" to "partial".
Storing values in an existing field
If a target field with the same name already exists for a document, the count value will be added to the end of the existing value list. It is possible to change this default behavior with
setOnSet(PropertySetter)
.Can be used as a pre-parse tagger on text document only when matching strings on document content, or both as a pre-parse or post-parse handler when the "fieldMatcher" is used.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.CountMatchesTagger" toField="(target field)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (optional expression for fields used to count matches) </fieldMatcher> <countMatcher>(expression used to count matches)</countMatcher> </handler>
XML usage example:
<handler class="CountMatchesTagger" toField="urlSegmentCount"> <fieldMatcher>document.reference</fieldMatcher> <countMatcher method="regex"> /[^/]+ </countMatcher> </handler>
The above will count the number of segments in a URL.
- Since:
- 2.6.0
- Author:
- Pascal Essiembre
- See Also:
Pattern
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CountMatchesTagger.MatchDetails
Deprecated.
-
Constructor Summary
Constructors Constructor Description CountMatchesTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, usesetToField(String)
,setFieldMatcher(TextMatcher)
, andsetCountMatcher(TextMatcher)
.boolean
equals(Object other)
TextMatcher
getCountMatcher()
Gets the count matcher.TextMatcher
getFieldMatcher()
Gets the field matcher.List<CountMatchesTagger.MatchDetails>
getMatchesDetails()
Deprecated.int
getMaxReadSize()
Gets the maximum number of characters to read from content for tagging at once.PropertySetter
getOnSet()
Gets the property setter to use when a value is set.String
getToField()
Sets the target field.int
hashCode()
protected void
loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.void
removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, this method does nothing.protected void
saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setCountMatcher(TextMatcher countMatcher)
Sets the count matcher.void
setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher.void
setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read from content for tagging at once.void
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.void
setToField(String toField)
Gets the target field.protected void
tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractCharStreamTagger
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagTextDocument
protected void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagTextDocument
in classAbstractCharStreamTagger
- Throws:
ImporterHandlerException
-
getMaxReadSize
public int getMaxReadSize()
Gets the maximum number of characters to read from content for tagging at once. Default isTextReader.DEFAULT_MAX_READ_SIZE
.- Returns:
- maximum read size
-
setMaxReadSize
public void setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read from content for tagging at once.- Parameters:
maxReadSize
- maximum read size
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets the field matcher.- Returns:
- field matcher
- Since:
- 3.0.0
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher.- Parameters:
fieldMatcher
- field matcher- Since:
- 3.0.0
-
getCountMatcher
public TextMatcher getCountMatcher()
Gets the count matcher.- Returns:
- count matcher
- Since:
- 3.0.0
-
setCountMatcher
public void setCountMatcher(TextMatcher countMatcher)
Sets the count matcher.- Parameters:
countMatcher
- count matcher- Since:
- 3.0.0
-
getToField
public String getToField()
Sets the target field.- Returns:
- target field
- Since:
- 3.0.0
-
setToField
public void setToField(String toField)
Gets the target field.- Parameters:
toField
- target field- Since:
- 3.0.0
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.- Parameters:
onSet
- property setter- Since:
- 3.0.0
-
getMatchesDetails
@Deprecated public List<CountMatchesTagger.MatchDetails> getMatchesDetails()
Deprecated.Gets matches details.- Returns:
- matches details
-
removeMatchDetails
@Deprecated public void removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, this method does nothing.Removes match details.- Parameters:
matchDetails
- match details
-
addMatchDetails
@Deprecated public void addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, usesetToField(String)
,setFieldMatcher(TextMatcher)
, andsetCountMatcher(TextMatcher)
.Adds a match details.- Parameters:
matchDetails
- the match details
-
loadCharStreamTaggerFromXML
protected void loadCharStreamTaggerFromXML(XML xml)
Description copied from class:AbstractCharStreamTagger
Loads configuration settings specific to the implementing class.- Specified by:
loadCharStreamTaggerFromXML
in classAbstractCharStreamTagger
- Parameters:
xml
- xml configuration
-
saveCharStreamTaggerToXML
protected void saveCharStreamTaggerToXML(XML xml)
Description copied from class:AbstractCharStreamTagger
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveCharStreamTaggerToXML
in classAbstractCharStreamTagger
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractCharStreamTagger
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractCharStreamTagger
-
toString
public String toString()
- Overrides:
toString
in classAbstractCharStreamTagger
-
-