Class FieldReportTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.FieldReportTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class FieldReportTagger extends AbstractDocumentTagger
A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process. If you use this class to report on all fields discovered, make sure you use it as a post-parse handler, before you are limiting which fields you want to keep.
The report will list one field per row, along with a few sample values (3 by default). The samples will be the first ones encountered.
This handler does not impact the data being imported at all (it only reads it). It also do not store the "content" as a field.
When not specified with
setFile(Path)
, a file called "field-report.csv" will be created in the working directory.Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.FieldReportTagger" maxSamples="(max number of sample values)" withHeaders="[false|true]" withOccurences="[false|true]" truncateSamplesAt="(number of characters to truncate long samples)" file="(path to a local file)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> </handler>
XML usage example:
<handler class="FieldReportTagger" maxSamples="1" file="C:\reports\field-report.csv"/>
The above example logs all discovered fields into a "field-report.csv" file, along with only 1 example value..
- Since:
- 2.10.0
- Author:
- Pascal Essiembre
-
-
Field Summary
Fields Modifier and Type Field Description static Path
DEFAULT_FILE
static int
DEFAULT_MAX_SAMPLES
-
Constructor Summary
Constructors Constructor Description FieldReportTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
Path
getFile()
int
getMaxSamples()
int
getTruncateSamplesAt()
int
hashCode()
boolean
isWithHeaders()
boolean
isWithOccurences()
protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setFile(Path file)
void
setMaxSamples(int maxSamples)
void
setTruncateSamplesAt(int truncateSamplesAt)
void
setWithHeaders(boolean withHeaders)
void
setWithOccurences(boolean withOccurences)
void
tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Field Detail
-
DEFAULT_MAX_SAMPLES
public static final int DEFAULT_MAX_SAMPLES
- See Also:
- Constant Field Values
-
DEFAULT_FILE
public static final Path DEFAULT_FILE
-
-
Method Detail
-
getFile
public Path getFile()
-
setFile
public void setFile(Path file)
-
getMaxSamples
public int getMaxSamples()
-
setMaxSamples
public void setMaxSamples(int maxSamples)
-
isWithHeaders
public boolean isWithHeaders()
-
setWithHeaders
public void setWithHeaders(boolean withHeaders)
-
isWithOccurences
public boolean isWithOccurences()
-
setWithOccurences
public void setWithOccurences(boolean withOccurences)
-
getTruncateSamplesAt
public int getTruncateSamplesAt()
-
setTruncateSamplesAt
public void setTruncateSamplesAt(int truncateSamplesAt)
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocument
in classAbstractDocumentTagger
- Throws:
ImporterHandlerException
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-