public class StopCrawlerOnMaxEventListener extends Object implements IEventListener<Event>, IXMLConfigurable
Alternative to CrawlerConfig.setMaxDocuments(int)
for stopping
the crawler upon reaching specific event counts. The event counts are only
kept for a crawling session. They are reset to zero upon restarting
the crawler.
Not specifying any maximum or events has no effect.
The "maxDocuments" option deals with "processed" documents. Those are documents that were initially queued for crawling and crawling was attempted on them, whether that exercise what successful or not. That is, "maxDocuments" will not count documents that were sent to your committer for additions or deletions, but also documents that were rejected by your Importer configuration, produced errors, etc. This class gives you more control over what should trigger a crawler to stop.
Note that for this class to take effect, make sure that "maxDocuments" has
a high enough number or is set -1
(unlimited).
If your event matcher matches more than one event, you can decide what should be the expected behavior. Options are:
<listener
class="com.norconex.collector.core.crawler.event.impl.StopCrawlerOnMaxEventListener"
max="(maximum count)"
onMultiple="[any|all|sum]">
<eventMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(event name-matching expression)
</eventMatcher>
</listener>
<listener
class="StopCrawlerOnMaxEventListener"
max="100"
onMultiple="sum">
<eventMatcher
method="csv">
DOCUMENT_COMMITTED_UPSERT,DOCUMENT_COMMITTED_DELETE
</eventMatcher>
</listener>
The above example will stop the crawler when the sum of committed documents (upserts + deletions) reaches 100.
Modifier and Type | Class and Description |
---|---|
static class |
StopCrawlerOnMaxEventListener.OnMultiple |
Constructor and Description |
---|
StopCrawlerOnMaxEventListener() |
Modifier and Type | Method and Description |
---|---|
void |
accept(Event event) |
boolean |
equals(Object other) |
TextMatcher |
getEventMatcher()
Gets the event matcher used to identify which events will be counted.
|
long |
getMaximum() |
StopCrawlerOnMaxEventListener.OnMultiple |
getOnMultiple() |
int |
hashCode() |
void |
loadFromXML(XML xml) |
void |
saveToXML(XML xml) |
void |
setEventMatcher(TextMatcher eventMatcher)
Sets the event matcher used to identify which events will be counted.
|
void |
setMaximum(long maximum) |
void |
setOnMultiple(StopCrawlerOnMaxEventListener.OnMultiple onMultiple) |
String |
toString() |
public TextMatcher getEventMatcher()
null
public void setEventMatcher(TextMatcher eventMatcher)
eventMatcher
- event matcherpublic StopCrawlerOnMaxEventListener.OnMultiple getOnMultiple()
public void setOnMultiple(StopCrawlerOnMaxEventListener.OnMultiple onMultiple)
public long getMaximum()
public void setMaximum(long maximum)
public void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
Copyright © 2014–2023 Norconex Inc.. All rights reserved.