CsvSplitter (Norconex Importer 2.11.0 API)

java.lang.Object
- com.norconex.importer.handler.AbstractImporterHandler
- - com.norconex.importer.handler.splitter.AbstractDocumentSplitter
  - - com.norconex.importer.handler.splitter.impl.CsvSplitter

All Implemented Interfaces:: IXMLConfigurable, IImporterHandler, IDocumentSplitter

public class CsvSplitter
extends AbstractDocumentSplitter
implements IXMLConfigurable

Split files with Coma-Separated values (or any other characters, like tab) into one document per line.

Can be used both as a pre-parse (text documents) or post-parse handler documents.

XML configuration usage:

  <splitter class="com.norconex.importer.handler.splitter.impl.CsvSplitter"
          separatorCharacter=""
          quoteCharacter=""
          escapeCharacter=""
          useFirstRowAsFields="(false|true)"
          linesToSkip="(integer)"
          referenceColumn="(column name or position from 1)"
          contentColumns="(csv list of column/position to use as content)" >

      <restrictTo caseSensitive="[false|true]"
              field="(name of header/metadata field name to match)">
          (regular expression of value to match)
      </restrictTo>
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->   

  </splitter>

Usage example:

Given this sample CSV file content...

 'clientId','clientName','clientOrg','orgDesc'
 '123','Joe Dalton','ACME Inc.','Organization\'s description' 
 '345','Avrel Dalton','Daisy Town','Another one'

... this example will split the file into two documents (one for each row after the header row):

  <splitter class="com.norconex.importer.handler.splitter.impl.CsvSplitter"
          separatorCharacter=","
          quoteCharacter="'"
          escapeCharacter="\"
          useFirstRowAsFields="true"
          linesToSkip="0"
          referenceColumn="clientId"
          contentColumns="orgDesc" />

Since:: 2.0.0
Author:: Pascal Essiembre

Field Summary

Fields
Modifier and Type Field and Description

static char DEFAULT_ESCAPE_CHARACTER

static char DEFAULT_QUOTE_CHARACTER

static char DEFAULT_SEPARATOR_CHARACTER

Fields
Modifier and Type	Field and Description
`static char`	`DEFAULT_ESCAPE_CHARACTER`
`static char`	`DEFAULT_QUOTE_CHARACTER`
`static char`	`DEFAULT_SEPARATOR_CHARACTER`

Constructor Summary

Constructors
Constructor and Description

CsvSplitter()

Constructors
Constructor and Description
`CsvSplitter()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`equals(Object other)`
`String[]`	`getContentColumns()`
`char`	`getEscapeCharacter()` Gets the escape character.
`int`	`getLinesToSkip()` Gets how many lines to skip before starting to parse lines.
`char`	`getQuoteCharacter()` Get the value's surrounding quotes character.
`String`	`getReferenceColumn()`
`char`	`getSeparatorCharacter()` Gets the value-separator character.
`int`	`hashCode()`
`boolean`	`isUseFirstRowAsFields()` Whether to use the first row as field names for values.
`protected void`	`loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)` Loads configuration settings specific to the implementing class.
`protected void`	`saveHandlerToXML(EnhancedXMLStreamWriter writer)` Saves configuration settings specific to the implementing class.
`void`	`setContentColumns(String... contentColumns)`
`void`	`setEscapeCharacter(char escapeCharacter)` Sets the escape character.
`void`	`setLinesToSkip(int linesToSkip)` Sets how many lines to skip before starting to parse lines.
`void`	`setQuoteCharacter(char quoteCharacter)` Sets the value's surrounding quotes character.
`void`	`setReferenceColumn(String referenceColumn)`
`void`	`setSeparatorCharacter(char separatorCharacter)` Sets the value-separator character.
`void`	`setUseFirstRowAsFields(boolean useFirstRowAsFields)` Sets whether to use the first row as field names for values.
`protected List<ImporterDocument>`	`splitApplicableDocument(SplittableDocument doc, OutputStream output, CachedStreamFactory streamFactory, boolean parsed)`
`String`	`toString()`

Methods inherited from class com.norconex.importer.handler.splitter.AbstractDocumentSplitter
splitDocument

Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface com.norconex.commons.lang.config.IXMLConfigurable
loadFromXML, saveToXML

- Field Detail
  - DEFAULT_SEPARATOR_CHARACTER
```
public static final char DEFAULT_SEPARATOR_CHARACTER
```
    See Also:
    
    Constant Field Values
  - DEFAULT_QUOTE_CHARACTER
```
public static final char DEFAULT_QUOTE_CHARACTER
```
    See Also:
    
    Constant Field Values
  - DEFAULT_ESCAPE_CHARACTER
```
public static final char DEFAULT_ESCAPE_CHARACTER
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CsvSplitter
```
public CsvSplitter()
```
- Method Detail
  - splitApplicableDocument
```
protected List<ImporterDocument> splitApplicableDocument(SplittableDocument doc,
                                                         OutputStream output,
                                                         CachedStreamFactory streamFactory,
                                                         boolean parsed)
                                                  throws ImporterHandlerException
```
    Specified by:
    
    splitApplicableDocument in class AbstractDocumentSplitter
    
    Throws:
    
    ImporterHandlerException
  - getSeparatorCharacter
```
public char getSeparatorCharacter()
```
    Gets the value-separator character.
    
    Returns:
    
    value-separator character
  - setSeparatorCharacter
```
public void setSeparatorCharacter(char separatorCharacter)
```
    Sets the value-separator character. Default is a comma (,).
    
    Parameters:
    
    separatorCharacter - value-separator character
  - getQuoteCharacter
```
public char getQuoteCharacter()
```
    Get the value's surrounding quotes character.
    
    Returns:
    
    value's surrounding quotes character
  - setQuoteCharacter
```
public void setQuoteCharacter(char quoteCharacter)
```
    Sets the value's surrounding quotes character. Default is the double-quote character (").
    
    Parameters:
    
    quoteCharacter - value's surrounding quotes character
  - getEscapeCharacter
```
public char getEscapeCharacter()
```
    Gets the escape character.
    
    Returns:
    
    escape character
  - setEscapeCharacter
```
public void setEscapeCharacter(char escapeCharacter)
```
    Sets the escape character. Default is the backslash character (\).
    
    Parameters:
    
    escapeCharacter - escape character
  - isUseFirstRowAsFields
```
public boolean isUseFirstRowAsFields()
```
    Whether to use the first row as field names for values.
    
    Returns:
    
    true if using first row as field names.
  - setUseFirstRowAsFields
```
public void setUseFirstRowAsFields(boolean useFirstRowAsFields)
```
    Sets whether to use the first row as field names for values. Default is false.
    
    Parameters:
    
    useFirstRowAsFields - true if using first row as field names
  - getLinesToSkip
```
public int getLinesToSkip()
```
    Gets how many lines to skip before starting to parse lines.
    
    Returns:
    
    how many lines to skip
  - setLinesToSkip
```
public void setLinesToSkip(int linesToSkip)
```
    Sets how many lines to skip before starting to parse lines. Default is 0.
    
    Parameters:
    
    linesToSkip - how many lines to skip
  - getReferenceColumn
```
public String getReferenceColumn()
```
  - setReferenceColumn
```
public void setReferenceColumn(String referenceColumn)
```
  - getContentColumns
```
public String[] getContentColumns()
```
  - setContentColumns
```
public void setContentColumns(String... contentColumns)
```
  - loadHandlerFromXML
```
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
```
    Description copied from class: AbstractImporterHandler
    
    Loads configuration settings specific to the implementing class.
    
    Specified by:
    
    loadHandlerFromXML in class AbstractImporterHandler
    
    Parameters:
    
    xml - xml configuration
  - saveHandlerToXML
```
protected void saveHandlerToXML(EnhancedXMLStreamWriter writer)
                         throws XMLStreamException
```
    Description copied from class: AbstractImporterHandler
    
    Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.
    
    Specified by:
    
    saveHandlerToXML in class AbstractImporterHandler
    
    Parameters:
    
    writer - the xml writer
    
    Throws:
    
    XMLStreamException - could not save to XML
  - equals
```
public boolean equals(Object other)
```
    Overrides:
    
    equals in class AbstractImporterHandler
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class AbstractImporterHandler
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class AbstractImporterHandler

Class CsvSplitter

XML configuration usage:

Usage example:

Field Summary

Constructor Summary

Method Summary

Methods inherited from class com.norconex.importer.handler.splitter.AbstractDocumentSplitter

Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler

Methods inherited from class java.lang.Object

Methods inherited from interface com.norconex.commons.lang.config.IXMLConfigurable

Field Detail

DEFAULT_SEPARATOR_CHARACTER

DEFAULT_QUOTE_CHARACTER

DEFAULT_ESCAPE_CHARACTER

Constructor Detail

CsvSplitter

Method Detail

splitApplicableDocument

getSeparatorCharacter

setSeparatorCharacter

getQuoteCharacter

setQuoteCharacter

getEscapeCharacter

setEscapeCharacter

isUseFirstRowAsFields

setUseFirstRowAsFields

getLinesToSkip

setLinesToSkip

getReferenceColumn

setReferenceColumn

getContentColumns

setContentColumns

loadHandlerFromXML

saveHandlerToXML

equals

hashCode

toString