Class ImporterConfig
- java.lang.Object
-
- com.norconex.importer.ImporterConfig
-
- All Implemented Interfaces:
IXMLConfigurable
public class ImporterConfig extends Object implements IXMLConfigurable
Importer configuration.- Author:
- Pascal Essiembre
-
-
Field Summary
Fields Modifier and Type Field Description static long
DEFAULT_MAX_MEM_INSTANCE
100 MB.static long
DEFAULT_MAX_MEM_POOL
1 GB.static String
DEFAULT_TEMP_DIR_PATH
-
Constructor Summary
Constructors Constructor Description ImporterConfig()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
equals(Object other)
long
getMaxFileCacheSize()
Deprecated.Since 3.0.0, usegetMaxMemoryInstance()
.long
getMaxFilePoolCacheSize()
Deprecated.Since 3.0.0, usegetMaxMemoryPool()
.long
getMaxMemoryInstance()
Gets the maximum number of bytes used for memory caching of a single documents being processed.long
getMaxMemoryPool()
Gets the maximum number of bytes used for memory caching of data for all documents concurrently being processed.Path
getParseErrorsSaveDir()
Gets the directory where file generating parsing errors will be saved.IDocumentParserFactory
getParserFactory()
Consumer<HandlerContext>
getPostParseConsumer()
Gets theConsumer
to be executed on documents after their parsing has occurred.List<IImporterHandler>
getPostParseHandlers()
Deprecated.Since 3.0.0, usegetPostParseConsumer()
insteadConsumer<HandlerContext>
getPreParseConsumer()
Gets theConsumer
to be executed on documents before their parsing has occurred.List<IImporterHandler>
getPreParseHandlers()
Deprecated.Since 3.0.0, usegetPreParseConsumer()
insteadList<IImporterResponseProcessor>
getResponseProcessors()
Path
getTempDir()
Gets the temporary directory where files can be deleted safely by the OS or any other processes when the Importer is not running.int
hashCode()
void
loadFromXML(XML xml)
void
saveToXML(XML xml)
void
setMaxFileCacheSize(long maxFileCacheSize)
Deprecated.Since 3.0.0, usesetMaxMemoryInstance(long)
.void
setMaxFilePoolCacheSize(long maxFilePoolCacheSize)
Deprecated.Since 3.0.0, usesetMaxMemoryPool(long)
.void
setMaxMemoryInstance(long maxMemoryInstance)
Sets the maximum number of bytes used for memory caching of a single documents being processed.void
setMaxMemoryPool(long maxMemoryPool)
Sets the maximum number of bytes used for memory caching of data for all documents concurrently being processed.void
setParseErrorsSaveDir(Path parseErrorsSaveDir)
Sets the directory where file generating parsing errors will be saved.void
setParserFactory(IDocumentParserFactory parserFactory)
void
setPostParseConsumer(Consumer<HandlerContext> consumer)
Sets theConsumer
to be executed on documents after their parsing has occurred.void
setPostParseHandlers(List<IImporterHandler> postParseHandlers)
Deprecated.Since 3.0.0, usesetPostParseConsumer(Consumer)
insteadvoid
setPreParseConsumer(Consumer<HandlerContext> consumer)
Sets theConsumer
to be executed on documents before their parsing has occurred.void
setPreParseHandlers(List<IImporterHandler> preParseHandlers)
Deprecated.Since 3.0.0, usesetPreParseConsumer(Consumer)
insteadvoid
setResponseProcessors(List<IImporterResponseProcessor> responseProcessors)
void
setTempDir(Path tempDir)
Sets the temporary directory where files can be deleted safely by the OS or any other processes when the Importer is not running.String
toString()
-
-
-
Field Detail
-
DEFAULT_TEMP_DIR_PATH
public static final String DEFAULT_TEMP_DIR_PATH
-
DEFAULT_MAX_MEM_INSTANCE
public static final long DEFAULT_MAX_MEM_INSTANCE
100 MB.
-
DEFAULT_MAX_MEM_POOL
public static final long DEFAULT_MAX_MEM_POOL
1 GB.
-
-
Method Detail
-
getParserFactory
public IDocumentParserFactory getParserFactory()
-
setParserFactory
public void setParserFactory(IDocumentParserFactory parserFactory)
-
getParseErrorsSaveDir
public Path getParseErrorsSaveDir()
Gets the directory where file generating parsing errors will be saved. Default isnull
(not storing errors).- Returns:
- directory where to save error files
-
setParseErrorsSaveDir
public void setParseErrorsSaveDir(Path parseErrorsSaveDir)
Sets the directory where file generating parsing errors will be saved.- Parameters:
parseErrorsSaveDir
- directory where to save error files
-
getPreParseConsumer
public Consumer<HandlerContext> getPreParseConsumer()
Gets theConsumer
to be executed on documents before their parsing has occurred.- Returns:
- the document consumer
- Since:
- 3.0.0
-
setPreParseConsumer
public void setPreParseConsumer(Consumer<HandlerContext> consumer)
Sets the
Consumer
to be executed on documents before their parsing has occurred. The consumer will automatically be created when relying on XML configuration of handlers (IImporterHandler
). XML configuration also offers extra XML tags to create basic "flow" for handler execution.To programmatically set multiple consumers or take advantage of the many configurable
IImporterHandler
instances instead, you can useFunctionUtil.allConsumers(Consumer...)
orHandlerConsumer.fromHandlers(IImporterHandler...)
respectively to create a consumer.- Parameters:
consumer
- the document consumer- Since:
- 3.0.0
-
getPostParseConsumer
public Consumer<HandlerContext> getPostParseConsumer()
Gets theConsumer
to be executed on documents after their parsing has occurred.- Returns:
- the document consumer
- Since:
- 3.0.0
-
setPostParseConsumer
public void setPostParseConsumer(Consumer<HandlerContext> consumer)
Sets the
Consumer
to be executed on documents after their parsing has occurred. The consumer will automatically be created when relying on XML configuration of handlers (IImporterHandler
). XML configuration also offers extra XML tags to create basic "flow" for handler execution.To programmatically set multiple consumers or take advantage of the many configurable
IImporterHandler
instances instead, you can useFunctionUtil.allConsumers(Consumer...)
orHandlerConsumer.fromHandlers(IImporterHandler...)
respectively to create a consumer.- Parameters:
consumer
- the document consumer- Since:
- 3.0.0
-
getPreParseHandlers
@Deprecated public List<IImporterHandler> getPreParseHandlers()
Deprecated.Since 3.0.0, usegetPreParseConsumer()
insteadGets importer handlers to be executed on documents before they are parsed.- Returns:
- list of importer handlers
-
setPreParseHandlers
@Deprecated public void setPreParseHandlers(List<IImporterHandler> preParseHandlers)
Deprecated.Since 3.0.0, usesetPreParseConsumer(Consumer)
insteadSets importer handlers to be executed on documents before they are parsed.- Parameters:
preParseHandlers
- list of importer handlers
-
getPostParseHandlers
@Deprecated public List<IImporterHandler> getPostParseHandlers()
Deprecated.Since 3.0.0, usegetPostParseConsumer()
insteadGets importer handlers to be executed on documents after they are parsed.- Returns:
- list of importer handlers
-
setPostParseHandlers
@Deprecated public void setPostParseHandlers(List<IImporterHandler> postParseHandlers)
Deprecated.Since 3.0.0, usesetPostParseConsumer(Consumer)
insteadSets importer handlers to be executed on documents after they are parsed.- Parameters:
postParseHandlers
- list of importer handlers
-
getResponseProcessors
public List<IImporterResponseProcessor> getResponseProcessors()
-
setResponseProcessors
public void setResponseProcessors(List<IImporterResponseProcessor> responseProcessors)
-
getTempDir
public Path getTempDir()
Gets the temporary directory where files can be deleted safely by the OS or any other processes when the Importer is not running. When not set, the importer will use the system temporary directory.
This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their temp/cache directory built-in.- Returns:
- path to temporary directory
-
setTempDir
public void setTempDir(Path tempDir)
Sets the temporary directory where files can be deleted safely by the OS or any other processes when the Importer is not running. When not set, the importer will use the system temporary directory.
This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their temp/cache directory built-in.- Parameters:
tempDir
- path to temporary directory
-
getMaxMemoryInstance
public long getMaxMemoryInstance()
Gets the maximum number of bytes used for memory caching of a single documents being processed. Default is
DEFAULT_MAX_MEM_INSTANCE
.This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their memory settings built-in.- Returns:
- max document memory cache size
- Since:
- 3.0.0
-
setMaxMemoryInstance
public void setMaxMemoryInstance(long maxMemoryInstance)
Sets the maximum number of bytes used for memory caching of a single documents being processed.
This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their memory settings built-in.- Parameters:
maxMemoryInstance
- max document memory cache size- Since:
- 3.0.0
-
getMaxMemoryPool
public long getMaxMemoryPool()
Gets the maximum number of bytes used for memory caching of data for all documents concurrently being processed. Default is
DEFAULT_MAX_MEM_POOL
.This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their memory settings built-in.- Returns:
- max documents memory pool cache size
- Since:
- 3.0.0
-
setMaxMemoryPool
public void setMaxMemoryPool(long maxMemoryPool)
Sets the maximum number of bytes used for memory caching of data for all documents concurrently being processed.
This only get used when the Importer launched directly from the command-line or when importing documents via
Importer.importDocument(ImporterRequest)
. Documents imported viaImporter.importDocument(Doc)
already have their memory settings built-in.- Parameters:
maxMemoryPool
- max documents memory pool cache size- Since:
- 3.0.0
-
getMaxFileCacheSize
@Deprecated public long getMaxFileCacheSize()
Deprecated.Since 3.0.0, usegetMaxMemoryInstance()
.- Returns:
- byte amount
-
setMaxFileCacheSize
@Deprecated public void setMaxFileCacheSize(long maxFileCacheSize)
Deprecated.Since 3.0.0, usesetMaxMemoryInstance(long)
.- Parameters:
maxFileCacheSize
- byte amount
-
getMaxFilePoolCacheSize
@Deprecated public long getMaxFilePoolCacheSize()
Deprecated.Since 3.0.0, usegetMaxMemoryPool()
.- Returns:
- byte amount
-
setMaxFilePoolCacheSize
@Deprecated public void setMaxFilePoolCacheSize(long maxFilePoolCacheSize)
Deprecated.Since 3.0.0, usesetMaxMemoryPool(long)
.- Parameters:
maxFilePoolCacheSize
- byte amount
-
loadFromXML
public void loadFromXML(XML xml)
- Specified by:
loadFromXML
in interfaceIXMLConfigurable
-
saveToXML
public void saveToXML(XML xml)
- Specified by:
saveToXML
in interfaceIXMLConfigurable
-
-