public class CsvSplitter extends AbstractDocumentSplitter implements IXMLConfigurable
Split files with Coma-Separated values (or any other characters, like tab) into one document per line.
Can be used both as a pre-parse (text documents) or post-parse handler documents.
<splitter class="com.norconex.importer.handler.splitter.impl.CsvSplitter" separatorCharacter="" quoteCharacter="" escapeCharacter="" useFirstRowAsFields="(false|true)" linesToSkip="(integer)" referenceColumn="(column name or position from 1)" contentColumns="(csv list of column/position to use as content)" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> </splitter>
Given this sample CSV file content...
'clientId','clientName','clientOrg','orgDesc' '123','Joe Dalton','ACME Inc.','Organization\'s description' '345','Avrel Dalton','Daisy Town','Another one'
... this example will split the file into two documents (one for each row after the header row):
<splitter class="com.norconex.importer.handler.splitter.impl.CsvSplitter" separatorCharacter="," quoteCharacter="'" escapeCharacter="\" useFirstRowAsFields="true" linesToSkip="0" referenceColumn="clientId" contentColumns="orgDesc" />
Modifier and Type | Field and Description |
---|---|
static char |
DEFAULT_ESCAPE_CHARACTER |
static char |
DEFAULT_QUOTE_CHARACTER |
static char |
DEFAULT_SEPARATOR_CHARACTER |
Constructor and Description |
---|
CsvSplitter() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String[] |
getContentColumns() |
char |
getEscapeCharacter()
Gets the escape character.
|
int |
getLinesToSkip()
Gets how many lines to skip before starting to parse lines.
|
char |
getQuoteCharacter()
Get the value's surrounding quotes character.
|
String |
getReferenceColumn() |
char |
getSeparatorCharacter()
Gets the value-separator character.
|
int |
hashCode() |
boolean |
isUseFirstRowAsFields()
Whether to use the first row as field names for values.
|
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setContentColumns(String... contentColumns) |
void |
setEscapeCharacter(char escapeCharacter)
Sets the escape character.
|
void |
setLinesToSkip(int linesToSkip)
Sets how many lines to skip before starting to parse lines.
|
void |
setQuoteCharacter(char quoteCharacter)
Sets the value's surrounding quotes character.
|
void |
setReferenceColumn(String referenceColumn) |
void |
setSeparatorCharacter(char separatorCharacter)
Sets the value-separator character.
|
void |
setUseFirstRowAsFields(boolean useFirstRowAsFields)
Sets whether to use the first row as field names for values.
|
protected List<ImporterDocument> |
splitApplicableDocument(SplittableDocument doc,
OutputStream output,
CachedStreamFactory streamFactory,
boolean parsed) |
String |
toString() |
splitDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public static final char DEFAULT_SEPARATOR_CHARACTER
public static final char DEFAULT_QUOTE_CHARACTER
public static final char DEFAULT_ESCAPE_CHARACTER
protected List<ImporterDocument> splitApplicableDocument(SplittableDocument doc, OutputStream output, CachedStreamFactory streamFactory, boolean parsed) throws ImporterHandlerException
splitApplicableDocument
in class AbstractDocumentSplitter
ImporterHandlerException
public char getSeparatorCharacter()
public void setSeparatorCharacter(char separatorCharacter)
separatorCharacter
- value-separator characterpublic char getQuoteCharacter()
public void setQuoteCharacter(char quoteCharacter)
quoteCharacter
- value's surrounding quotes characterpublic char getEscapeCharacter()
public void setEscapeCharacter(char escapeCharacter)
escapeCharacter
- escape characterpublic boolean isUseFirstRowAsFields()
true
if using first row as field names.public void setUseFirstRowAsFields(boolean useFirstRowAsFields)
false
.useFirstRowAsFields
- true
if using first row as
field namespublic int getLinesToSkip()
public void setLinesToSkip(int linesToSkip)
linesToSkip
- how many lines to skippublic String getReferenceColumn()
public void setReferenceColumn(String referenceColumn)
public String[] getContentColumns()
public void setContentColumns(String... contentColumns)
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.