public class StripBeforeTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found before first match found for given pattern.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<handler
class="com.norconex.importer.handler.transformer.impl.StripBeforeTransformer"
inclusive="[false|true]"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<stripBeforeMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
>
(expression matching text up to which to strip)
</stripBeforeMatcher>
</handler>
<handler
class="StripBeforeTransformer"
inclusive="true">
<stripBeforeMatcher>
<![CDATA[<!-- HEADER_END -->]]>
</stripBeforeMatcher>
</handler>
The above example will strip all text up to and including this HTML comment:
<!-- HEADER_END -->
.
Constructor and Description |
---|
StripBeforeTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
TextMatcher |
getStripBeforeMatcher()
Gets the matcher for the text up to which to strip content.
|
String |
getStripBeforeRegex()
Deprecated.
Since 3.0.0, use
getStripBeforeMatcher() . |
int |
hashCode() |
boolean |
isCaseSensitive()
Deprecated.
Since 3.0.0, use
getStripBeforeMatcher() . |
boolean |
isInclusive() |
protected void |
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setCaseSensitive(boolean caseSensitive)
Deprecated.
Since 3.0.0, use
setStripBeforeMatcher(TextMatcher) . |
void |
setInclusive(boolean inclusive)
Sets whether the match itself should be stripped or not.
|
void |
setStripBeforeMatcher(TextMatcher stripBeforeMatcher)
Sets the matcher for the text up to which to strip content.
|
void |
setStripBeforeRegex(String regex)
Deprecated.
Since 3.0.0, use
setStripBeforeMatcher(TextMatcher) . |
String |
toString() |
protected void |
transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
transformStringContent
in class AbstractStringTransformer
public TextMatcher getStripBeforeMatcher()
public void setStripBeforeMatcher(TextMatcher stripBeforeMatcher)
stripBeforeMatcher
- text matcherpublic boolean isInclusive()
public void setInclusive(boolean inclusive)
inclusive
- true
to strip start and end text@Deprecated public boolean isCaseSensitive()
getStripBeforeMatcher()
.true
if case sensitive@Deprecated public void setCaseSensitive(boolean caseSensitive)
setStripBeforeMatcher(TextMatcher)
.caseSensitive
- true
if case sensitive@Deprecated public String getStripBeforeRegex()
getStripBeforeMatcher()
.@Deprecated public void setStripBeforeRegex(String regex)
setStripBeforeMatcher(TextMatcher)
.regex
- expressionprotected void loadStringTransformerFromXML(XML xml)
AbstractStringTransformer
loadStringTransformerFromXML
in class AbstractStringTransformer
xml
- XML configurationprotected void saveStringTransformerToXML(XML xml)
AbstractStringTransformer
saveStringTransformerToXML
in class AbstractStringTransformer
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractStringTransformer
public int hashCode()
hashCode
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009–2023 Norconex Inc.. All rights reserved.