public class FieldReportTagger extends AbstractDocumentTagger
A utility tagger that reports in a CSV file the fields discovered in a crawl session, captured at the point of your choice in the importing process. If you use this class to report on all fields discovered, make sure you use it as a post-parse handler, before you are limiting which fields you want to keep.
The report will list one field per row, along with a few sample values (3 by default). The samples will be the first ones encountered.
This handler does not impact the data being imported at all (it only reads it). It also do not store the "content" as a field.
When not specified with setFile(Path)
, a file called
"field-report.csv" will be created in the working directory.
Can be used both as a pre-parse or post-parse handler.
<handler
class="com.norconex.importer.handler.tagger.impl.FieldReportTagger"
maxSamples="(max number of sample values)"
withHeaders="[false|true]"
withOccurences="[false|true]"
truncateSamplesAt="(number of characters to truncate long samples)"
file="(path to a local file)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
</handler>
<handler
class="FieldReportTagger"
maxSamples="1"
file="C:\reports\field-report.csv"/>
The above example logs all discovered fields into a "field-report.csv" file, along with only 1 example value..
Modifier and Type | Field and Description |
---|---|
static Path |
DEFAULT_FILE |
static int |
DEFAULT_MAX_SAMPLES |
Constructor and Description |
---|
FieldReportTagger() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
Path |
getFile() |
int |
getMaxSamples() |
int |
getTruncateSamplesAt() |
int |
hashCode() |
boolean |
isWithHeaders() |
boolean |
isWithOccurences() |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setFile(Path file) |
void |
setMaxSamples(int maxSamples) |
void |
setTruncateSamplesAt(int truncateSamplesAt) |
void |
setWithHeaders(boolean withHeaders) |
void |
setWithOccurences(boolean withOccurences) |
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public static final int DEFAULT_MAX_SAMPLES
public static final Path DEFAULT_FILE
public Path getFile()
public void setFile(Path file)
public int getMaxSamples()
public void setMaxSamples(int maxSamples)
public boolean isWithHeaders()
public void setWithHeaders(boolean withHeaders)
public boolean isWithOccurences()
public void setWithOccurences(boolean withOccurences)
public int getTruncateSamplesAt()
public void setTruncateSamplesAt(int truncateSamplesAt)
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.