Class DocumentLengthTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class DocumentLengthTagger
    extends AbstractDocumentTagger

    Adds the document length (i.e., number of bytes) to the specified field. The length is the document content length as it is in its current processing stage. If for instance you set this tagger after a transformer that modifies the content, the obtained length will be for the modified content, and not the original length. To obtain a document's length before any modification was made to it, use this tagger as one of the first handler in your pre-parse handlers.

    Storing values in an existing field

    If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a PropertySetter.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.DocumentLengthTagger"
        toField="(mandatory target field)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
    </handler>

    XML usage example:

    
    <handler
        class="DocumentLengthTagger"
        toField="docSize"/>

    The following stores the document lenght into a "docSize" field.

    Since:
    2.2.0
    Author:
    Pascal Essiembre