public class CSVFileCommitter extends AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
Commits documents to CSV files (Comma Separated Value). There are two kinds of document representations: upserts and deletions.
If you request to split upserts and deletions into separate files,
the generated files will start with "upsert-" (for additions/modifications)
and "delete-" (for deletions).
A request "type" field is always added when both upserts and deletes are
added to the same file. Default header name for it is type
,
but you can supply your own name with setTypeHeader(String)
.
The generated files are never updated. Sending a modified document with the same reference will create a new entry and won't modify any existing ones. You can think of the generated files as a set of commit instructions.
The generated CSV file names are made of a timestamp and a sequence number.
You have the option to give a prefix or suffix to files that will be created (default does not add any).
The document content is represented by creating a column with a blank or
null
field name. When requested, the "content" column
will always be present for both upserts and deletes, even if deletes do not
have content, for consistency.
By default, values longer than 5096 are truncated.
You can specify a different maximum length globally, or for each column.
Use -1
for unlimited lenght, or 0
to use the
the global value, or 5096 if the global value
is also zero.
Applications consuming CSV files often have different expectations. Subtle format differences that can make opening or parsing a generated CSV file difficult. To help with this, there are preset CSV formats you can chose from:
More information on those can be obtained on Apache Commons CSV website. Other formatting options you explicitely configure will overwrite the corresponding setting for the chosen format.
<committer
class="com.norconex.committer.core3.fs.impl.CSVFileCommitter"
format="(see class documentation)"
showHeaders="[false|true]"
delimiter="(single delimiter character)"
quote="(single quote character)"
escape="(single escape character)"
multiValueJoinDelimiter="(delimiter string)"
typeHeader="(header name for commit request type column)"
truncateAt="(truncate after N characters, default: 5096, unlimited: -1)">
<!-- Repeat "col" for every desired column. -->
<col
field="(source field name, omit or leave blank for document content)"
header="(optional column header name)"
truncateAt="(overwrite truncate)"/>
<directory>(path where to save the files)</directory>
<docsPerFile>(max number of docs per file)</docsPerFile>
<compress>[false|true]</compress>
<splitUpsertDelete>[false|true]</splitUpsertDelete>
<fileNamePrefix>(optional prefix to created file names)</fileNamePrefix>
<fileNameSuffix>(optional suffix to created file names)</fileNameSuffix>
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMappings>
<!-- Add as many field mappings as needed -->
<mapping
fromField="(source field name)"
toField="(target field name)"/>
</fieldMappings>
</committer>
Modifier and Type | Class and Description |
---|---|
static class |
CSVFileCommitter.Column |
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_TRUNCATE_AT |
Constructor and Description |
---|
CSVFileCommitter() |
doClean, doClose, doDelete, doInit, doUpsert, getDirectory, getDocsPerFile, getFileNamePrefix, getFileNameSuffix, isCompress, isSplitUpsertDelete, loadCommitterFromXML, saveCommitterToXML, setCompress, setDirectory, setDocsPerFile, setFileNamePrefix, setFileNameSuffix, setSplitUpsertDelete
accept, addRestriction, addRestrictions, applyFieldMappings, clean, clearFieldMappings, clearRestrictions, close, delete, fireDebug, fireDebug, fireError, fireError, fireInfo, fireInfo, getCommitterContext, getFieldMappings, getRestrictions, init, loadFromXML, removeFieldMapping, removeRestriction, removeRestriction, saveToXML, setFieldMapping, setFieldMappings, upsert
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public static final int DEFAULT_TRUNCATE_AT
public String getFormat()
public void setFormat(String format)
public Character getDelimiter()
public void setDelimiter(Character delimiter)
public Character getQuote()
public void setQuote(Character quote)
public boolean isShowHeaders()
public void setShowHeaders(boolean showHeaders)
public Character getEscape()
public void setEscape(Character escape)
public int getTruncateAt()
public void setTruncateAt(int truncateAt)
public String getMultiValueJoinDelimiter()
public void setMultiValueJoinDelimiter(String multiValueJoinDelimiter)
public List<CSVFileCommitter.Column> getColumns()
public void setColumns(List<CSVFileCommitter.Column> columns)
public void setColumns(CSVFileCommitter.Column... columns)
public String getTypeHeader()
public void setTypeHeader(String typeHeader)
protected String getFileExtension()
getFileExtension
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
protected org.apache.commons.csv.CSVPrinter createDocWriter(Writer writer) throws IOException
createDocWriter
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
IOException
protected void writeUpsert(org.apache.commons.csv.CSVPrinter csv, UpsertRequest upsertRequest) throws IOException
writeUpsert
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
IOException
protected void writeDelete(org.apache.commons.csv.CSVPrinter csv, DeleteRequest deleteRequest) throws IOException
writeDelete
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
IOException
protected void closeDocWriter(org.apache.commons.csv.CSVPrinter csv) throws IOException
closeDocWriter
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
IOException
public void loadFSCommitterFromXML(XML xml)
loadFSCommitterFromXML
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
public void saveFSCommitterToXML(XML xml)
saveFSCommitterToXML
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
public boolean equals(Object other)
equals
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
public int hashCode()
hashCode
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
public String toString()
toString
in class AbstractFSCommitter<org.apache.commons.csv.CSVPrinter>
Copyright © 2009–2022 Norconex Inc.. All rights reserved.