Package com.norconex.importer.handler
Class AbstractImporterHandler
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- All Implemented Interfaces:
IXMLConfigurable
- Direct Known Subclasses:
AbstractDocumentFilter
,AbstractDocumentSplitter
,AbstractDocumentTagger
,AbstractDocumentTransformer
public abstract class AbstractImporterHandler extends Object implements IXMLConfigurable
Base class for handlers applying only to certain type of documents by providing a way to restrict applicable documents based on a metadata field value, where the value matches a regular expression. For instance, to apply a handler only to text documents, you can use the following:myHandler.setRestriction(new PropertyMatcher("document.contentType", new TextMatcher(Method.REGEX).setPattern("^text/.*$")));
Subclasses must test if a document is accepted using the
isApplicable(HandlerDoc, ParseState)
method.Subclasses can safely be used as either pre-parse or post-parse handlers.
XML configuration usage:
<!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo>
Subclasses inherit the above
IXMLConfigurable
configuration.XML usage example:
<restrictTo> <fieldMatcher>document.contentType</fieldMatcher> <valueMatcher method="wildcard"> text/* </valueMatcher> </restrictTo>
The above will apply to any content type starting with "text/".
- Since:
- 2.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractImporterHandler()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this handler should be restricted to.void
addRestriction(String field, String regex, boolean caseSensitive)
Deprecated.Since 3.0.0, useaddRestriction(PropertyMatcher...)
.void
addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this handler should be restricted to.void
clearRestrictions()
Clears all restrictions.protected String
detectCharsetIfBlank(HandlerDoc doc, InputStream is, String charset, ParseState parseState)
Deprecated.Since 3.0.0, charset was already detected or useCharsetUtil.firstNonBlankOrUTF8(ParseState, String...)
boolean
equals(Object other)
PropertyMatchers
getRestrictions()
Gets all restrictionsint
hashCode()
protected boolean
isApplicable(HandlerDoc doc, ParseState parseState)
Class to invoke by subclasses to find out if this handler should be rejected or not based on the metadata restriction provided.void
loadFromXML(XML xml)
protected abstract void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.boolean
removeRestriction(PropertyMatcher restriction)
Removes a restriction.int
removeRestriction(String field)
Removes all restrictions on a given field.protected abstract void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
saveToXML(XML xml)
String
toString()
-
-
-
Method Detail
-
addRestriction
@Deprecated public void addRestriction(String field, String regex, boolean caseSensitive)
Deprecated.Since 3.0.0, useaddRestriction(PropertyMatcher...)
.Adds a restriction this handler should be restricted to.- Parameters:
field
- metadata property/fieldregex
- regular expressioncaseSensitive
- whether regular expression should be case sensitive
-
addRestriction
public void addRestriction(PropertyMatcher... restrictions)
Adds one or more restrictions this handler should be restricted to.- Parameters:
restrictions
- the restrictions- Since:
- 2.4.0
-
addRestrictions
public void addRestrictions(List<PropertyMatcher> restrictions)
Adds restrictions this handler should be restricted to.- Parameters:
restrictions
- the restrictions- Since:
- 2.4.0
-
removeRestriction
public int removeRestriction(String field)
Removes all restrictions on a given field.- Parameters:
field
- the field to remove restrictions on- Returns:
- how many elements were removed
- Since:
- 2.4.0
-
removeRestriction
public boolean removeRestriction(PropertyMatcher restriction)
Removes a restriction.- Parameters:
restriction
- the restriction to remove- Returns:
true
if this handler contained the restriction- Since:
- 2.4.0
-
clearRestrictions
public void clearRestrictions()
Clears all restrictions.- Since:
- 2.4.0
-
getRestrictions
public PropertyMatchers getRestrictions()
Gets all restrictions- Returns:
- the restrictions
- Since:
- 2.4.0
-
isApplicable
protected final boolean isApplicable(HandlerDoc doc, ParseState parseState)
Class to invoke by subclasses to find out if this handler should be rejected or not based on the metadata restriction provided.- Parameters:
doc
- documentparseState
- if the document was parsed (i.e. imported) already- Returns:
true
if this handler is applicable to the document
-
detectCharsetIfBlank
@Deprecated protected final String detectCharsetIfBlank(HandlerDoc doc, InputStream is, String charset, ParseState parseState)
Deprecated.Since 3.0.0, charset was already detected or useCharsetUtil.firstNonBlankOrUTF8(ParseState, String...)
Convenience method for handlers that need to detect an input encoding if the explicitly provided encoding is blank. Detection is only attempted if parsing has not occurred (since parsing converts everything to UTF-8 already).- Parameters:
doc
- the document to detect charset onis
- the document input streamcharset
- the character encoding to test if blankparseState
- whether the document has already been parsed or not.- Returns:
- detected and clean encoding.
-
loadFromXML
public final void loadFromXML(XML xml)
- Specified by:
loadFromXML
in interfaceIXMLConfigurable
-
loadHandlerFromXML
protected abstract void loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml
- XML configuration
-
saveToXML
public void saveToXML(XML xml)
- Specified by:
saveToXML
in interfaceIXMLConfigurable
-
saveHandlerToXML
protected abstract void saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.- Parameters:
xml
- the XML
-
-