Class CountMatchesTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- com.norconex.importer.handler.tagger.impl.CountMatchesTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
public class CountMatchesTagger extends AbstractCharStreamTagger
Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".
If no "fieldMatcher" expression is specified, the document content will be used. If the "fieldMatcher" matches more than one field, the sum of all matches will be stored as a single value. More often than not, you probably want to set your "countMatcher" to "partial".
Storing values in an existing field
If a target field with the same name already exists for a document, the count value will be added to the end of the existing value list. It is possible to change this default behavior with
setOnSet(PropertySetter).Can be used as a pre-parse tagger on text document only when matching strings on document content, or both as a pre-parse or post-parse handler when the "fieldMatcher" is used.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.CountMatchesTagger" toField="(target field)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (optional expression for fields used to count matches) </fieldMatcher> <countMatcher>(expression used to count matches)</countMatcher> </handler>XML usage example:
<handler class="CountMatchesTagger" toField="urlSegmentCount"> <fieldMatcher>document.reference</fieldMatcher> <countMatcher method="regex"> /[^/]+ </countMatcher> </handler>The above will count the number of segments in a URL.
- Since:
- 2.6.0
- Author:
- Pascal Essiembre
- See Also:
Pattern
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCountMatchesTagger.MatchDetailsDeprecated.
-
Constructor Summary
Constructors Constructor Description CountMatchesTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidaddMatchDetails(CountMatchesTagger.MatchDetails matchDetails)Deprecated.Since 3.0.0, usesetToField(String),setFieldMatcher(TextMatcher), andsetCountMatcher(TextMatcher).booleanequals(Object other)TextMatchergetCountMatcher()Gets the count matcher.TextMatchergetFieldMatcher()Gets the field matcher.List<CountMatchesTagger.MatchDetails>getMatchesDetails()Deprecated.intgetMaxReadSize()Gets the maximum number of characters to read from content for tagging at once.PropertySettergetOnSet()Gets the property setter to use when a value is set.StringgetToField()Sets the target field.inthashCode()protected voidloadCharStreamTaggerFromXML(XML xml)Loads configuration settings specific to the implementing class.voidremoveMatchDetails(CountMatchesTagger.MatchDetails matchDetails)Deprecated.Since 3.0.0, this method does nothing.protected voidsaveCharStreamTaggerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetCountMatcher(TextMatcher countMatcher)Sets the count matcher.voidsetFieldMatcher(TextMatcher fieldMatcher)Sets the field matcher.voidsetMaxReadSize(int maxReadSize)Sets the maximum number of characters to read from content for tagging at once.voidsetOnSet(PropertySetter onSet)Sets the property setter to use when a value is set.voidsetToField(String toField)Gets the target field.protected voidtagTextDocument(HandlerDoc doc, Reader input, ParseState parseState)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractCharStreamTagger
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagTextDocument
protected void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagTextDocumentin classAbstractCharStreamTagger- Throws:
ImporterHandlerException
-
getMaxReadSize
public int getMaxReadSize()
Gets the maximum number of characters to read from content for tagging at once. Default isTextReader.DEFAULT_MAX_READ_SIZE.- Returns:
- maximum read size
-
setMaxReadSize
public void setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read from content for tagging at once.- Parameters:
maxReadSize- maximum read size
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets the field matcher.- Returns:
- field matcher
- Since:
- 3.0.0
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher.- Parameters:
fieldMatcher- field matcher- Since:
- 3.0.0
-
getCountMatcher
public TextMatcher getCountMatcher()
Gets the count matcher.- Returns:
- count matcher
- Since:
- 3.0.0
-
setCountMatcher
public void setCountMatcher(TextMatcher countMatcher)
Sets the count matcher.- Parameters:
countMatcher- count matcher- Since:
- 3.0.0
-
getToField
public String getToField()
Sets the target field.- Returns:
- target field
- Since:
- 3.0.0
-
setToField
public void setToField(String toField)
Gets the target field.- Parameters:
toField- target field- Since:
- 3.0.0
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.- Parameters:
onSet- property setter- Since:
- 3.0.0
-
getMatchesDetails
@Deprecated public List<CountMatchesTagger.MatchDetails> getMatchesDetails()
Deprecated.Gets matches details.- Returns:
- matches details
-
removeMatchDetails
@Deprecated public void removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, this method does nothing.Removes match details.- Parameters:
matchDetails- match details
-
addMatchDetails
@Deprecated public void addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Deprecated.Since 3.0.0, usesetToField(String),setFieldMatcher(TextMatcher), andsetCountMatcher(TextMatcher).Adds a match details.- Parameters:
matchDetails- the match details
-
loadCharStreamTaggerFromXML
protected void loadCharStreamTaggerFromXML(XML xml)
Description copied from class:AbstractCharStreamTaggerLoads configuration settings specific to the implementing class.- Specified by:
loadCharStreamTaggerFromXMLin classAbstractCharStreamTagger- Parameters:
xml- xml configuration
-
saveCharStreamTaggerToXML
protected void saveCharStreamTaggerToXML(XML xml)
Description copied from class:AbstractCharStreamTaggerSaves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveCharStreamTaggerToXMLin classAbstractCharStreamTagger- Parameters:
xml- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractCharStreamTagger
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractCharStreamTagger
-
toString
public String toString()
- Overrides:
toStringin classAbstractCharStreamTagger
-
-