Class AbstractImporterHandler

  • All Implemented Interfaces:
    IXMLConfigurable
    Direct Known Subclasses:
    AbstractDocumentFilter, AbstractDocumentSplitter, AbstractDocumentTagger, AbstractDocumentTransformer

    public abstract class AbstractImporterHandler
    extends Object
    implements IXMLConfigurable
    Base class for handlers applying only to certain type of documents by providing a way to restrict applicable documents based on a metadata field value, where the value matches a regular expression. For instance, to apply a handler only to text documents, you can use the following:
       myHandler.setRestriction(new PropertyMatcher("document.contentType",
              new TextMatcher(Method.REGEX).setPattern("^text/.*$")));
     

    Subclasses must test if a document is accepted using the isApplicable(HandlerDoc, ParseState) method.

    Subclasses can safely be used as either pre-parse or post-parse handlers.

    XML configuration usage:

    
    <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
    <restrictTo>
      <fieldMatcher>(field-matching expression)</fieldMatcher>
      <valueMatcher>(value-matching expression)</valueMatcher>
    </restrictTo>

    Subclasses inherit the above IXMLConfigurable configuration.

    XML usage example:

    
    <restrictTo>
      <fieldMatcher>document.contentType</fieldMatcher>
      <valueMatcher
          method="wildcard">
        text/*
      </valueMatcher>
    </restrictTo>

    The above will apply to any content type starting with "text/".

    Since:
    2.0.0
    Author:
    Pascal Essiembre
    • Constructor Detail

      • AbstractImporterHandler

        public AbstractImporterHandler()
    • Method Detail

      • addRestriction

        @Deprecated
        public void addRestriction​(String field,
                                   String regex,
                                   boolean caseSensitive)
        Deprecated.
        Adds a restriction this handler should be restricted to.
        Parameters:
        field - metadata property/field
        regex - regular expression
        caseSensitive - whether regular expression should be case sensitive
      • addRestriction

        public void addRestriction​(PropertyMatcher... restrictions)
        Adds one or more restrictions this handler should be restricted to.
        Parameters:
        restrictions - the restrictions
        Since:
        2.4.0
      • addRestrictions

        public void addRestrictions​(List<PropertyMatcher> restrictions)
        Adds restrictions this handler should be restricted to.
        Parameters:
        restrictions - the restrictions
        Since:
        2.4.0
      • removeRestriction

        public int removeRestriction​(String field)
        Removes all restrictions on a given field.
        Parameters:
        field - the field to remove restrictions on
        Returns:
        how many elements were removed
        Since:
        2.4.0
      • removeRestriction

        public boolean removeRestriction​(PropertyMatcher restriction)
        Removes a restriction.
        Parameters:
        restriction - the restriction to remove
        Returns:
        true if this handler contained the restriction
        Since:
        2.4.0
      • clearRestrictions

        public void clearRestrictions()
        Clears all restrictions.
        Since:
        2.4.0
      • getRestrictions

        public PropertyMatchers getRestrictions()
        Gets all restrictions
        Returns:
        the restrictions
        Since:
        2.4.0
      • isApplicable

        protected final boolean isApplicable​(HandlerDoc doc,
                                             ParseState parseState)
        Class to invoke by subclasses to find out if this handler should be rejected or not based on the metadata restriction provided.
        Parameters:
        doc - document
        parseState - if the document was parsed (i.e. imported) already
        Returns:
        true if this handler is applicable to the document
      • detectCharsetIfBlank

        @Deprecated
        protected final String detectCharsetIfBlank​(HandlerDoc doc,
                                                    InputStream is,
                                                    String charset,
                                                    ParseState parseState)
        Deprecated.
        Since 3.0.0, charset was already detected or use CharsetUtil.firstNonBlankOrUTF8(ParseState, String...)
        Convenience method for handlers that need to detect an input encoding if the explicitly provided encoding is blank. Detection is only attempted if parsing has not occurred (since parsing converts everything to UTF-8 already).
        Parameters:
        doc - the document to detect charset on
        is - the document input stream
        charset - the character encoding to test if blank
        parseState - whether the document has already been parsed or not.
        Returns:
        detected and clean encoding.
      • loadHandlerFromXML

        protected abstract void loadHandlerFromXML​(XML xml)
        Loads configuration settings specific to the implementing class.
        Parameters:
        xml - XML configuration
      • saveHandlerToXML

        protected abstract void saveHandlerToXML​(XML xml)
        Saves configuration settings specific to the implementing class.
        Parameters:
        xml - the XML
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object