Class AbstractTikaParser

    • Constructor Detail

      • AbstractTikaParser

        public AbstractTikaParser​(org.apache.tika.parser.Parser parser)
        Creates a new Tika-based parser.
        Parameters:
        parser - Tika parser
    • Method Detail

      • initialize

        public void initialize​(ParseHints parserHints)
        Description copied from interface: IHintsAwareParser
        Initialize this parser with the given parse hints. While not mandatory, aware parsers are strongly encouraged to support applicable hints.
        Specified by:
        initialize in interface IHintsAwareParser
        Parameters:
        parserHints - configuration settings influencing parsing when possible or appropriate
      • modifyParseContext

        protected void modifyParseContext​(org.apache.tika.parser.ParseContext parseContext)
        Override to apply your own settings on the Tika ParseContext. The ParseContext is already configured before calling this method. Changing existing settings may cause failure. Only override if you know what you are doing. The default implementation does nothing.
        Parameters:
        parseContext - Tika parse context
      • addTikaMetadataToImporterMetadata

        protected void addTikaMetadataToImporterMetadata​(org.apache.tika.metadata.Metadata tikaMeta,
                                                         Properties metadata)
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object
      • isSplitEmbedded

        @Deprecated
        public boolean isSplitEmbedded()
        Deprecated.
        Gets whether embedded documents should be split to become "standalone" distinct documents.
        Returns:
        true if parser should split embedded documents.
      • setSplitEmbedded

        @Deprecated
        public void setSplitEmbedded​(boolean splitEmbedded)
        Deprecated.
        Sets whether embedded documents should be split to become "standalone" distinct documents.
        Parameters:
        splitEmbedded - true if parser should split embedded documents.