public class FieldReportTagger extends AbstractDocumentTagger
A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process. If you use this class to report on all fields discovered, make sure you use it as a post-parse handler, before you are limiting which fields you want to keep.
The report will list one field per row, along with a few sample values (3 by default). The samples will be the first ones encountered.
This handler does not impact the data being imported at all (it only reads it). It also do not store the "content" as a field.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.FieldReportTagger" maxSamples="(max number of sample values)" withHeaders="[false|true]" withOccurences="[false|true]" truncateSamplesAt="(number of characters to truncate long samples)" file="(path to a local file)" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> </tagger>
The following logs all discovered fields into a "field-report.csv" file, along with only 1 example value..
<tagger class="com.norconex.importer.handler.tagger.impl.FieldReportTagger" maxSamples="1" file="C:\reports\field-report.csv" />
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_SAMPLES |
Constructor and Description |
---|
FieldReportTagger() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
File |
getFile() |
int |
getMaxSamples() |
int |
getTruncateSamplesAt() |
int |
hashCode() |
boolean |
isWithHeaders() |
boolean |
isWithOccurences() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setFile(File file) |
void |
setMaxSamples(int maxSamples) |
void |
setTruncateSamplesAt(int truncateSamplesAt) |
void |
setWithHeaders(boolean withHeaders) |
void |
setWithOccurences(boolean withOccurences) |
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public static final int DEFAULT_MAX_SAMPLES
public File getFile()
public void setFile(File file)
public int getMaxSamples()
public void setMaxSamples(int maxSamples)
public boolean isWithHeaders()
public void setWithHeaders(boolean withHeaders)
public boolean isWithOccurences()
public void setWithOccurences(boolean withOccurences)
public int getTruncateSamplesAt()
public void setTruncateSamplesAt(int truncateSamplesAt)
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic String toString()
toString
in class AbstractImporterHandler
public boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.