public class DocumentLengthTagger extends AbstractDocumentTagger
Adds the document length (i.e., number of bytes) to
the specified field
. The length is the document
content length as it is in its current processing stage. If for
instance you set this tagger after a transformer that modifies the content,
the obtained length will be for the modified content, and not the
original length. To obtain a document's length before any modification
was made to it, use this tagger as one of the first
handler in your pre-parse handlers.
If a target field with the same name already exists for a document,
values will be added to the end of the existing value list.
It is possible to change this default behavior by supplying a
PropertySetter
.
Can be used both as a pre-parse or post-parse handler.
<handler
class="com.norconex.importer.handler.tagger.impl.DocumentLengthTagger"
toField="(mandatory target field)"
onSet="[append|prepend|replace|optional]">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
</handler>
<handler
class="DocumentLengthTagger"
toField="docSize"/>
The following stores the document lenght into a "docSize" field.
Constructor and Description |
---|
DocumentLengthTagger() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getField()
Deprecated.
Since 3.0.0, use
getToField() |
PropertySetter |
getOnSet()
Gets the property setter to use when a value is set.
|
String |
getToField()
Gets the target field.
|
int |
hashCode() |
boolean |
isOverwrite()
Deprecated.
Since 3.0.0 use
getOnSet() . |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setField(String toField)
Deprecated.
Since 3.0.0, use
setToField(String) |
void |
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.
|
void |
setOverwrite(boolean overwrite)
Deprecated.
Since 3.0.0 use
setOnSet(PropertySetter) . |
void |
setToField(String toField)
Sets the target field.
|
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public String getToField()
public void setToField(String toField)
toField
- target field@Deprecated public String getField()
getToField()
@Deprecated public void setField(String toField)
setToField(String)
toField
- target field@Deprecated public boolean isOverwrite()
getOnSet()
.true
if overwriting existing value.@Deprecated public void setOverwrite(boolean overwrite)
setOnSet(PropertySetter)
.overwrite
- true
if overwriting existing value.public PropertySetter getOnSet()
public void setOnSet(PropertySetter onSet)
onSet
- property setterprotected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.