public interface ICrawlDataStore
Holds necessary information about all references (e.g. url, path, etc) crawling activities.
The few stages a reference should have in most implementations are:
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes a database connection.
|
int |
getActiveCount()
Gets the number of active references (currently being processed).
|
ICrawlData |
getCached(String cacheReference)
Gets the cached reference from previous time crawler was run
(e.g.
|
Iterator<ICrawlData> |
getCacheIterator()
Gets the cache iterator.
|
ICrawlData |
getProcessed(String reference)
Gets an already processed reference from the current crawl session.
|
int |
getProcessedCount()
Gets the number of references processed.
|
int |
getQueueSize()
Gets the size of the reference queue (number of
references left to process).
|
boolean |
isActive(String reference)
Whether the given reference is currently being processed (i.e.
|
boolean |
isCacheEmpty()
Whether there are any references the the cache from a previous crawler
run.
|
boolean |
isProcessed(String reference)
Whether the given reference has been processed.
|
boolean |
isQueued(String reference)
Whether the given reference is in the queue or not
(waiting to be processed).
|
boolean |
isQueueEmpty()
Whether there are any references to process in the queue.
|
ICrawlData |
nextQueued()
Returns the next reference to be processed from the queue and marks it as
being "active" (i.e.
|
void |
processed(ICrawlData crawlData)
Marks this reference as processed.
|
void |
queue(ICrawlData crawlData)
Queues a reference for future processing.
|
void queue(ICrawlData crawlData)
Queues a reference for future processing.
crawlData
- the reference to eventually be processedboolean isQueueEmpty()
true
if the queue is emptyint getQueueSize()
boolean isQueued(String reference)
reference
- the referencetrue
if the reference is in the queueICrawlData nextQueued()
boolean isActive(String reference)
reference
- the referencetrue
if activeint getActiveCount()
ICrawlData getCached(String cacheReference)
cacheReference
- reference cached from previous runboolean isCacheEmpty()
true
if the cache is emptyvoid processed(ICrawlData crawlData)
crawlData
- processed referenceboolean isProcessed(String reference)
reference
- the referencetrue
if processedint getProcessedCount()
ICrawlData getProcessed(String reference)
reference
- reference to getIterator<ICrawlData> getCacheIterator()
void close()
Copyright © 2014–2021 Norconex Inc.. All rights reserved.