public class StripBetweenTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found between a matching start and end strings. The matching strings are defined in pairs and multiple ones can be specified at once.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<handler
class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<!-- multiple stripBetween tags allowed -->
<stripBetween
inclusive="[false|true]">
<startMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(expression matching "left" delimiter)
</startMatcher>
<endMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(expression matching "right" delimiter)
</endMatcher>
</stripBetween>
</handler>
<handler
class="StripBetweenTransformer">
<stripBetween
inclusive="true">
<startMatcher>
<![CDATA[<!-- SIDENAV_START -->]]>
</startMatcher>
<endMatcher>
<![CDATA[<!-- SIDENAV_END -->]]>
</endMatcher>
</stripBetween>
</handler>
The following will strip all text between (and including) these two
HTML comments:
<!-- SIDENAV_START -->
and
<!-- SIDENAV_END -->
.
Modifier and Type | Class and Description |
---|---|
static class |
StripBetweenTransformer.StripBetweenDetails |
Constructor and Description |
---|
StripBetweenTransformer() |
Modifier and Type | Method and Description |
---|---|
void |
addStripBetweenDetails(StripBetweenTransformer.StripBetweenDetails details)
Adds strip between instructions.
|
void |
addStripEndpoints(String fromText,
String toText)
Deprecated.
Since 3.0.0, use
addStripBetweenDetails(StripBetweenDetails) |
boolean |
equals(Object other) |
List<StripBetweenTransformer.StripBetweenDetails> |
getStripBetweenDetailsList()
Gets text between instructions.
|
List<Pair<String,String>> |
getStripEndpoints()
Deprecated.
Since 3.0.0, use
getStripBetweenDetailsList() . |
int |
hashCode() |
boolean |
isCaseSensitive()
Deprecated.
Since 3.0.0, use
isCaseSensitive() |
boolean |
isInclusive()
Deprecated.
Since 3.0.0, use
StripBetweenTransformer.StripBetweenDetails.isInclusive() |
protected void |
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setCaseSensitive(boolean caseSensitive)
Deprecated.
Since 3.0.0,
use
setCaseSensitive(boolean) |
void |
setInclusive(boolean inclusive)
Deprecated.
Since 3.0.0, use
StripBetweenTransformer.StripBetweenDetails.setInclusive(boolean) |
String |
toString() |
protected void |
transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
transformStringContent
in class AbstractStringTransformer
public void addStripBetweenDetails(StripBetweenTransformer.StripBetweenDetails details)
details
- "strip between" detailspublic List<StripBetweenTransformer.StripBetweenDetails> getStripBetweenDetailsList()
@Deprecated public boolean isInclusive()
StripBetweenTransformer.StripBetweenDetails.isInclusive()
false
@Deprecated public void setInclusive(boolean inclusive)
StripBetweenTransformer.StripBetweenDetails.setInclusive(boolean)
inclusive
- true
to keep matching start and end text@Deprecated public boolean isCaseSensitive()
isCaseSensitive()
false
@Deprecated public void setCaseSensitive(boolean caseSensitive)
setCaseSensitive(boolean)
caseSensitive
- true
to consider character case@Deprecated public void addStripEndpoints(String fromText, String toText)
addStripBetweenDetails(StripBetweenDetails)
fromText
- the left string to matchtoText
- the right string to match@Deprecated public List<Pair<String,String>> getStripEndpoints()
getStripBetweenDetailsList()
.protected void loadStringTransformerFromXML(XML xml)
AbstractStringTransformer
loadStringTransformerFromXML
in class AbstractStringTransformer
xml
- XML configurationprotected void saveStringTransformerToXML(XML xml)
AbstractStringTransformer
saveStringTransformerToXML
in class AbstractStringTransformer
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractStringTransformer
public int hashCode()
hashCode
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009–2023 Norconex Inc.. All rights reserved.