public class StripAfterTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found after first match found for given pattern.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<handler
class="com.norconex.importer.handler.transformer.impl.StripAfterTransformer"
inclusive="[false|true]"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<stripAfterMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
>
(expression matching text from which to strip)
</stripAfterMatcher>
</handler>
<handler
class="StripAfterTransformer"
inclusive="true">
<stripAfterMatcher>
<![CDATA[<!-- FOOTER -->]]>
</stripAfterMatcher>
</handler>
The above example will strip all text starting with the following HTML
comment and everything after it:
<!-- FOOTER -->
.
Constructor and Description |
---|
StripAfterTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
TextMatcher |
getStripAfterMatcher()
Gets the matcher for the text from which to strip content.
|
String |
getStripAfterRegex()
Deprecated.
Since 3.0.0, use
getStripAfterMatcher() . |
int |
hashCode() |
boolean |
isCaseSensitive()
Deprecated.
Since 3.0.0, use
getStripAfterMatcher() . |
boolean |
isInclusive() |
protected void |
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setCaseSensitive(boolean caseSensitive)
Deprecated.
Since 3.0.0, use
setStripAfterMatcher(TextMatcher) . |
void |
setInclusive(boolean inclusive)
Sets whether the match itself should be stripped or not.
|
void |
setStripAfterMatcher(TextMatcher stripAfterMatcher)
Sets the matcher for the text from which to strip content.
|
void |
setStripAfterRegex(String regex)
Deprecated.
Since 3.0.0, use
setStripAfterMatcher(TextMatcher) . |
String |
toString() |
protected void |
transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
transformStringContent
in class AbstractStringTransformer
public TextMatcher getStripAfterMatcher()
public void setStripAfterMatcher(TextMatcher stripAfterMatcher)
stripAfterMatcher
- text matcherpublic boolean isInclusive()
public void setInclusive(boolean inclusive)
inclusive
- true
to strip start and end text@Deprecated public boolean isCaseSensitive()
getStripAfterMatcher()
.true
if case sensitive@Deprecated public void setCaseSensitive(boolean caseSensitive)
setStripAfterMatcher(TextMatcher)
.caseSensitive
- true
if case sensitive@Deprecated public String getStripAfterRegex()
getStripAfterMatcher()
.@Deprecated public void setStripAfterRegex(String regex)
setStripAfterMatcher(TextMatcher)
.regex
- expressionprotected void loadStringTransformerFromXML(XML xml)
AbstractStringTransformer
loadStringTransformerFromXML
in class AbstractStringTransformer
xml
- XML configurationprotected void saveStringTransformerToXML(XML xml)
AbstractStringTransformer
saveStringTransformerToXML
in class AbstractStringTransformer
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractStringTransformer
public int hashCode()
hashCode
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009–2023 Norconex Inc.. All rights reserved.