Class StripBetweenTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.AbstractStringTransformer
-
- com.norconex.importer.handler.transformer.impl.StripBetweenTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class StripBetweenTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found between a matching start and end strings. The matching strings are defined in pairs and multiple ones can be specified at once.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple stripBetween tags allowed --> <stripBetween inclusive="[false|true]"> <startMatcher>(expression matching "left" delimiter)</startMatcher> <endMatcher>(expression matching "right" delimiter)</endMatcher> </stripBetween> </handler>
XML usage example:
<handler class="StripBetweenTransformer"> <stripBetween inclusive="true"> <startMatcher> <![CDATA[<!-- SIDENAV_START -->]]> </startMatcher> <endMatcher> <![CDATA[<!-- SIDENAV_END -->]]> </endMatcher> </stripBetween> </handler>
The following will strip all text between (and including) these two HTML comments:
<!-- SIDENAV_START -->
and<!-- SIDENAV_END -->
.- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StripBetweenTransformer.StripBetweenDetails
-
Constructor Summary
Constructors Constructor Description StripBetweenTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addStripBetweenDetails(StripBetweenTransformer.StripBetweenDetails details)
Adds strip between instructions.void
addStripEndpoints(String fromText, String toText)
Deprecated.Since 3.0.0, useaddStripBetweenDetails(StripBetweenDetails)
boolean
equals(Object other)
List<StripBetweenTransformer.StripBetweenDetails>
getStripBetweenDetailsList()
Gets text between instructions.List<Pair<String,String>>
getStripEndpoints()
Deprecated.Since 3.0.0, usegetStripBetweenDetailsList()
.int
hashCode()
boolean
isCaseSensitive()
Deprecated.Since 3.0.0, useisCaseSensitive()
boolean
isInclusive()
Deprecated.Since 3.0.0, useStripBetweenTransformer.StripBetweenDetails.isInclusive()
protected void
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetCaseSensitive(boolean)
void
setInclusive(boolean inclusive)
Deprecated.Since 3.0.0, useStripBetweenTransformer.StripBetweenDetails.setInclusive(boolean)
String
toString()
protected void
transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
-
-
-
Method Detail
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
- Specified by:
transformStringContent
in classAbstractStringTransformer
-
addStripBetweenDetails
public void addStripBetweenDetails(StripBetweenTransformer.StripBetweenDetails details)
Adds strip between instructions.- Parameters:
details
- "strip between" details- Since:
- 3.0.0
-
getStripBetweenDetailsList
public List<StripBetweenTransformer.StripBetweenDetails> getStripBetweenDetailsList()
Gets text between instructions.- Returns:
- "strip between" details
- Since:
- 3.0.0
-
isInclusive
@Deprecated public boolean isInclusive()
Deprecated.Since 3.0.0, useStripBetweenTransformer.StripBetweenDetails.isInclusive()
Gets whether start and end text pairs should be stripped or not.- Returns:
- always
false
-
setInclusive
@Deprecated public void setInclusive(boolean inclusive)
Deprecated.Since 3.0.0, useStripBetweenTransformer.StripBetweenDetails.setInclusive(boolean)
Sets whether start and end text pairs should be stripped or not. Calling this method has no effect.- Parameters:
inclusive
-true
to keep matching start and end text
-
isCaseSensitive
@Deprecated public boolean isCaseSensitive()
Deprecated.Since 3.0.0, useisCaseSensitive()
Gets whether to ignore case when matching start and end text.- Returns:
- always
false
-
setCaseSensitive
@Deprecated public void setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetCaseSensitive(boolean)
Sets whether to ignore case when matching start and end text. Calling this method has no effect.- Parameters:
caseSensitive
-true
to consider character case
-
addStripEndpoints
@Deprecated public void addStripEndpoints(String fromText, String toText)
Deprecated.Since 3.0.0, useaddStripBetweenDetails(StripBetweenDetails)
Adds a new pair of end points to match for stripping.- Parameters:
fromText
- the left string to matchtoText
- the right string to match
-
getStripEndpoints
@Deprecated public List<Pair<String,String>> getStripEndpoints()
Deprecated.Since 3.0.0, usegetStripBetweenDetailsList()
.Gets an empty list.- Returns:
- empty list
-
loadStringTransformerFromXML
protected void loadStringTransformerFromXML(XML xml)
Description copied from class:AbstractStringTransformer
Loads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXML
in classAbstractStringTransformer
- Parameters:
xml
- XML configuration
-
saveStringTransformerToXML
protected void saveStringTransformerToXML(XML xml)
Description copied from class:AbstractStringTransformer
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXML
in classAbstractStringTransformer
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringTransformer
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringTransformer
-
-