public abstract class CollectorConfig extends Object implements IXMLConfigurable
Base Collector configuration.
Subclasses inherit the following XML configuration items.
<workDir>
(Directory where generated files are written. Defaults to "./work")
</workDir>
<tempDir>
(Directory where generated files are written. Defaults to the working
directory + "./temp")
</tempDir>
<eventListeners>
<!-- Repeat as needed. -->
<listener
class="(IEventListener implementation class name.)"/>
</eventListeners>
<maxConcurrentCrawlers>
(Maximum number of crawlers that can run simultaneously.
Only applicable when more than one crawler is configured.
Defaults to -1, unlimited.)
</maxConcurrentCrawlers>
<crawlersStartInterval>
(Millisecond interval between each crawlers start. Defaut starts them
all at once.)
</crawlersStartInterval>
<maxMemoryPool>
(Maximum number of bytes used for memory caching of documents data. E.g.,
when processing documents. Defaults to 1 GB.)
</maxMemoryPool>
<maxMemoryInstance>
(Maximum number of bytes used for memory caching of each individual
documents document. Defaults to 100 MB.)
</maxMemoryInstance>
<deferredShutdownDuration>
(Optional amount of time to defer the collector shutdown when it is
done executing. This is useful if you have external processes that
need a bit of time to catch up. E.g.,: 10 seconds. Defaults to 0.)
</deferredShutdownDuration>
<crawlerDefaults>
<!--
All crawler options defined in a "crawler" section (except for
the crawler "id") can be set here as default shared between
multiple crawlers. Configuration blocks defined for a specific
crawler always takes precedence.
-->
</crawlerDefaults>
<crawlers>
<!-- You need to define at least one crawler. -->
<crawler
id="(Unique identifier for this crawler)">
<!-- Crawler settings -->
</crawler>
</crawlers>
XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
Modifier and Type | Field and Description |
---|---|
static Path |
DEFAULT_WORK_DIR
Default relative directory where progress files are stored.
|
Modifier | Constructor and Description |
---|---|
protected |
CollectorConfig() |
protected |
CollectorConfig(Class<? extends CrawlerConfig> crawlerConfigClass) |
Modifier and Type | Method and Description |
---|---|
void |
addEventListeners(IEventListener<?>... eventListeners)
Adds event listeners.
|
void |
addEventListeners(List<IEventListener<?>> eventListeners)
Adds event listeners.
|
void |
clearEventListeners()
Clears all event listeners.
|
boolean |
equals(Object other) |
List<CrawlerConfig> |
getCrawlerConfigs()
Gets crawler configurations.
|
Duration |
getCrawlersStartInterval()
Gets the amount of time between each concurrent crawlers are started.
|
Duration |
getDeferredShutdownDuration()
Gets the amount of time to defer the collector shutdown when it is
done executing.
|
List<IEventListener<?>> |
getEventListeners()
Gets event listeners.
|
String |
getId()
Gets this collector unique identifier.
|
int |
getMaxConcurrentCrawlers()
Gets the maximum number of crawlers that can be executed concurrently.
|
long |
getMaxMemoryInstance() |
long |
getMaxMemoryPool() |
int |
getMaxParallelCrawlers()
Deprecated.
Since 2.0.0, use
getMaxConcurrentCrawlers() |
Path |
getTempDir()
Gets the temporary directory where files can be deleted safely by the OS
or other processes when the collector is not running.
|
Path |
getWorkDir()
Gets the base directory location where files created during execution
are created.
|
int |
hashCode() |
protected abstract void |
loadCollectorConfigFromXML(XML xml) |
void |
loadFromXML(XML xml) |
protected abstract void |
saveCollectorConfigToXML(XML xml) |
void |
saveToXML(XML xml) |
void |
setCrawlerConfigs(CrawlerConfig... crawlerConfigs)
Sets crawler configurations.
|
void |
setCrawlerConfigs(List<CrawlerConfig> crawlerConfigs)
Sets crawler configurations.
|
void |
setCrawlersStartInterval(Duration crawlersStartInterval)
Sets the amount of time in between each concurrent crawlers are started.
|
void |
setDeferredShutdownDuration(Duration deferredShutdownDuration)
Sets the amount of time to defer the collector shutdown when it is
done executing.
|
void |
setEventListeners(IEventListener<?>... eventListeners)
Sets event listeners.
|
void |
setEventListeners(List<IEventListener<?>> eventListeners)
Sets event listeners.
|
void |
setId(String id)
Sets this collector unique identifier.
|
void |
setMaxConcurrentCrawlers(int maxConcurrentCrawlers)
Sets the maximum number of crawlers that can be executed concurrently.
|
void |
setMaxMemoryInstance(long maxMemoryInstance) |
void |
setMaxMemoryPool(long maxMemoryPool) |
void |
setMaxParallelCrawlers(int maxParallelCrawlers)
Deprecated.
Since 2.0.0, use
setMaxConcurrentCrawlers(int) |
void |
setTempDir(Path tempDir)
/**
Sets the temporary directory where files can be deleted safely by the OS
or other processes when the collector is not running.
|
void |
setWorkDir(Path workDir)
Sets the base directory location where files created during execution
are created.
|
String |
toString() |
public static final Path DEFAULT_WORK_DIR
protected CollectorConfig()
protected CollectorConfig(Class<? extends CrawlerConfig> crawlerConfigClass)
public String getId()
public void setId(String id)
id
- unique identifierpublic List<CrawlerConfig> getCrawlerConfigs()
null
)public void setCrawlerConfigs(CrawlerConfig... crawlerConfigs)
crawlerConfigs
- crawler configurationspublic void setCrawlerConfigs(List<CrawlerConfig> crawlerConfigs)
crawlerConfigs
- crawler configurationspublic Path getWorkDir()
null
the collector will use ./work
.
at runtime.public void setWorkDir(Path workDir)
null
the collector will use ./work
.
at runtime.workDir
- working directory pathpublic Path getTempDir()
null
the collector will use the working directory
+ /temp
at runtime.public void setTempDir(Path tempDir)
null
the collector will use the working directory
+ /temp
at runtime.tempDir
- temporary directorypublic long getMaxMemoryPool()
public void setMaxMemoryPool(long maxMemoryPool)
public long getMaxMemoryInstance()
public void setMaxMemoryInstance(long maxMemoryInstance)
@Deprecated public int getMaxParallelCrawlers()
getMaxConcurrentCrawlers()
-1
, which means no maximum.@Deprecated public void setMaxParallelCrawlers(int maxParallelCrawlers)
setMaxConcurrentCrawlers(int)
-1
for no maximum.maxParallelCrawlers
- number of maximum parallel crawlerspublic int getMaxConcurrentCrawlers()
-1
, which means no maximum.public void setMaxConcurrentCrawlers(int maxConcurrentCrawlers)
-1
for no maximum.maxConcurrentCrawlers
- maximum number of concurrent crawlerspublic Duration getCrawlersStartInterval()
null
(does not wait before launching concurrent
crawlers).public void setCrawlersStartInterval(Duration crawlersStartInterval)
crawlersStartInterval
- amount of timepublic List<IEventListener<?>> getEventListeners()
IEventListener
.public void setEventListeners(IEventListener<?>... eventListeners)
IEventListener
.eventListeners
- event listeners.public void setEventListeners(List<IEventListener<?>> eventListeners)
IEventListener
.eventListeners
- event listeners.public void addEventListeners(IEventListener<?>... eventListeners)
IEventListener
.eventListeners
- event listeners.public void addEventListeners(List<IEventListener<?>> eventListeners)
IEventListener
.eventListeners
- event listeners.public void clearEventListeners()
IEventListener
are not cleared.public Duration getDeferredShutdownDuration()
public void setDeferredShutdownDuration(Duration deferredShutdownDuration)
deferredShutdownDuration
- durationpublic void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
protected abstract void saveCollectorConfigToXML(XML xml)
public final void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
protected abstract void loadCollectorConfigFromXML(XML xml)
Copyright © 2014–2023 Norconex Inc.. All rights reserved.