public class StripBetweenTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found between a matching start and end strings. The matching strings are defined in pairs and multiple ones can be specified at once.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<transformer class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer" inclusive="[false|true]" caseSensitive="[false|true]" sourceCharset="(character encoding)" maxReadSize="(max characters to read at once)" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <stripBetween> <start>(regex)</start> <end>(regex)</end> </stripBetween> <!-- multiple stripBetween tags allowed --> </transformer>
The following will strip all text between (and including) these two
HTML comments:
<!-- SIDENAV_START -->
and
<!-- SIDENAV_END -->
.
<transformer class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer" inclusive="true" > <stripBetween> <start><![CDATA[<!-- SIDENAV_START -->]]></start> <end><![CDATA[<!-- SIDENAV_END -->]]></end> </stripBetween> </transformer>
Constructor and Description |
---|
StripBetweenTransformer() |
Modifier and Type | Method and Description |
---|---|
void |
addStripEndpoints(String fromText,
String toText) |
boolean |
equals(Object other) |
List<org.apache.commons.lang3.tuple.Pair<String,String>> |
getStripEndpoints() |
int |
hashCode() |
boolean |
isCaseSensitive() |
boolean |
isInclusive() |
protected void |
loadStringTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringTransformerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setCaseSensitive(boolean caseSensitive)
Sets whether to ignore case when matching start and end text.
|
void |
setInclusive(boolean inclusive)
Sets whether start and end text pairs should themselves be stripped or
not.
|
String |
toString() |
protected void |
transformStringContent(String reference,
StringBuilder content,
ImporterMetadata metadata,
boolean parsed,
int sectionIndex) |
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
protected void transformStringContent(String reference, StringBuilder content, ImporterMetadata metadata, boolean parsed, int sectionIndex)
transformStringContent
in class AbstractStringTransformer
public boolean isInclusive()
public void setInclusive(boolean inclusive)
inclusive
- true
to strip start and end textpublic boolean isCaseSensitive()
public void setCaseSensitive(boolean caseSensitive)
caseSensitive
- true
to consider character casepublic List<org.apache.commons.lang3.tuple.Pair<String,String>> getStripEndpoints()
protected void loadStringTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractStringTransformer
loadStringTransformerFromXML
in class AbstractStringTransformer
xml
- xml configurationIOException
- could not load from XMLprotected void saveStringTransformerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractStringTransformer
saveStringTransformerToXML
in class AbstractStringTransformer
writer
- the xml writerXMLStreamException
- could not save to XMLpublic int hashCode()
hashCode
in class AbstractStringTransformer
public boolean equals(Object other)
equals
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009–2021 Norconex Inc.. All rights reserved.