Class FieldReportTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class FieldReportTagger
    extends AbstractDocumentTagger

    A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process. If you use this class to report on all fields discovered, make sure you use it as a post-parse handler, before you are limiting which fields you want to keep.

    The report will list one field per row, along with a few sample values (3 by default). The samples will be the first ones encountered.

    This handler does not impact the data being imported at all (it only reads it). It also do not store the "content" as a field.

    When not specified with setFile(Path), a file called "field-report.csv" will be created in the working directory.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.FieldReportTagger"
        maxSamples="(max number of sample values)"
        withHeaders="[false|true]"
        withOccurences="[false|true]"
        truncateSamplesAt="(number of characters to truncate long samples)"
        file="(path to a local file)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
    </handler>

    XML usage example:

    
    <handler
        class="FieldReportTagger"
        maxSamples="1"
        file="C:\reports\field-report.csv"/>

    The above example logs all discovered fields into a "field-report.csv" file, along with only 1 example value..

    Since:
    2.10.0
    Author:
    Pascal Essiembre