public class XMLStreamSplitter extends AbstractDocumentSplitter implements IXMLConfigurable
Splits XML document on a specific element.
This class is suited for large XML documents. It will read the XML as a
stream and split as it is read, preserving memory during parsing.
For this reason, element matching is not as flexible as DOM-based XML
parsers, such as DOMSplitter
, but is more efficient on large
documents.
To identify the element to split on, you give the full path to it from the document root, where each element is separated by a forward slash. Let's take this XML as an example:
<animals> <species name="mouse"> <animal> <name>Itchy</name> <race>cartoon</race> </animal> </species> <species name="cat"> <animal> <name>Scratchy</name> <race>cartoon</race> </animal> </species> </animals>
To split on <animal>
, you would use this path:
/animals/species/animal
Should be used as a pre-parse handler.
By default, this filter is restricted to (applies only to) documents matching
the restrictions returned by
CommonRestrictions.xmlContentTypes(String)
.
You can specify your own restrictions to further narrow, or loosen what
documents this splitter applies to.
<handler
class="com.norconex.importer.handler.splitter.impl.XMLStreamSplitter"
path="(XML path)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
</handler>
<handler
class="XMLStreamSplitter"
path="/animals/species/animal"/>
The above example will create one document per animals, based on the sample XML given above.
DOMSplitter
Constructor and Description |
---|
XMLStreamSplitter() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getPath() |
int |
hashCode() |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setPath(String path) |
protected List<Doc> |
splitApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
String |
toString() |
splitDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public String getPath()
public void setPath(String path)
protected List<Doc> splitApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
splitApplicableDocument
in class AbstractDocumentSplitter
ImporterHandlerException
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.