Class StripBeforeTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.AbstractStringTransformer
-
- com.norconex.importer.handler.transformer.impl.StripBeforeTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class StripBeforeTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found before first match found for given pattern.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.StripBeforeTransformer" inclusive="[false|true]" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <stripBeforeMatcher> > (expression matching text up to which to strip) </stripBeforeMatcher> </handler>
XML usage example:
<handler class="StripBeforeTransformer" inclusive="true"> <stripBeforeMatcher> <![CDATA[<!-- HEADER_END -->]]> </stripBeforeMatcher> </handler>
The above example will strip all text up to and including this HTML comment:
<!-- HEADER_END -->
.- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description StripBeforeTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
equals(Object other)
TextMatcher
getStripBeforeMatcher()
Gets the matcher for the text up to which to strip content.String
getStripBeforeRegex()
Deprecated.Since 3.0.0, usegetStripBeforeMatcher()
.int
hashCode()
boolean
isCaseSensitive()
Deprecated.Since 3.0.0, usegetStripBeforeMatcher()
.boolean
isInclusive()
protected void
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetStripBeforeMatcher(TextMatcher)
.void
setInclusive(boolean inclusive)
Sets whether the match itself should be stripped or not.void
setStripBeforeMatcher(TextMatcher stripBeforeMatcher)
Sets the matcher for the text up to which to strip content.void
setStripBeforeRegex(String regex)
Deprecated.Since 3.0.0, usesetStripBeforeMatcher(TextMatcher)
.String
toString()
protected void
transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
-
-
-
Method Detail
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
- Specified by:
transformStringContent
in classAbstractStringTransformer
-
getStripBeforeMatcher
public TextMatcher getStripBeforeMatcher()
Gets the matcher for the text up to which to strip content.- Returns:
- text matcher
- Since:
- 3.0.0
-
setStripBeforeMatcher
public void setStripBeforeMatcher(TextMatcher stripBeforeMatcher)
Sets the matcher for the text up to which to strip content.- Parameters:
stripBeforeMatcher
- text matcher- Since:
- 3.0.0
-
isInclusive
public boolean isInclusive()
-
setInclusive
public void setInclusive(boolean inclusive)
Sets whether the match itself should be stripped or not.- Parameters:
inclusive
-true
to strip start and end text
-
isCaseSensitive
@Deprecated public boolean isCaseSensitive()
Deprecated.Since 3.0.0, usegetStripBeforeMatcher()
.Gets whether matching is case sensitive.- Returns:
true
if case sensitive
-
setCaseSensitive
@Deprecated public void setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetStripBeforeMatcher(TextMatcher)
.Sets whether matching is case sensitive.- Parameters:
caseSensitive
-true
if case sensitive
-
getStripBeforeRegex
@Deprecated public String getStripBeforeRegex()
Deprecated.Since 3.0.0, usegetStripBeforeMatcher()
.Gets the expression matching text up to which to strip.- Returns:
- expression
-
setStripBeforeRegex
@Deprecated public void setStripBeforeRegex(String regex)
Deprecated.Since 3.0.0, usesetStripBeforeMatcher(TextMatcher)
.Sets the expression matching text up to which to strip.- Parameters:
regex
- expression
-
loadStringTransformerFromXML
protected void loadStringTransformerFromXML(XML xml)
Description copied from class:AbstractStringTransformer
Loads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXML
in classAbstractStringTransformer
- Parameters:
xml
- XML configuration
-
saveStringTransformerToXML
protected void saveStringTransformerToXML(XML xml)
Description copied from class:AbstractStringTransformer
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXML
in classAbstractStringTransformer
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringTransformer
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringTransformer
-
-