Class HttpCrawler
java.lang.Object
com.norconex.collector.core.crawler.Crawler
com.norconex.collector.http.crawler.HttpCrawler
The HTTP Crawler.
- Author:
- Pascal Essiembre
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.norconex.collector.core.crawler.Crawler
Crawler.ReferenceProcessStatus -
Constructor Summary
ConstructorsConstructorDescriptionHttpCrawler(HttpCrawlerConfig crawlerConfig, HttpCollector collector) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected voidbeforeCrawlerExecution(boolean resume) protected voidprotected CrawlDocInfocreateChildDocInfo(String embeddedReference, CrawlDocInfo parentCrawlData) protected voidexecuteCommitterPipeline(Crawler crawler, CrawlDoc doc) protected ImporterResponseexecuteImporterPipeline(ImporterPipelineContext importerContext) protected voidexecuteQueuePipeline(CrawlDocInfo crawlRef) protected Class<? extends CrawlDocInfo>protected voidinitCrawlDoc(CrawlDoc doc) protected booleanprotected voidMethods inherited from class com.norconex.collector.core.crawler.Crawler
clean, deleteCacheOrphans, destroyCrawler, doExecute, exportDataStore, getCollector, getCommitterService, getDataStoreEngine, getDocInfoService, getDownloadDir, getEventManager, getId, getImporter, getMonitor, getStreamFactory, getTempDir, getWorkDir, handleOrphans, importDataStore, initCrawler, isMaxDocuments, isStopped, processNextReference, processReferences, reprocessCacheOrphans, start, stop, toString
-
Constructor Details
-
HttpCrawler
Constructor.- Parameters:
crawlerConfig- HTTP crawler configurationcollector- http collector this crawler belongs to
-
-
Method Details
-
getCrawlerConfig
- Overrides:
getCrawlerConfigin classCrawler
-
getHttpFetchClient
-
getSitemapResolver
- Returns:
- the sitemapResolver
-
getDedupMetadataStore
-
getDedupDocumentStore
-
isQueueInitialized
protected boolean isQueueInitialized()- Overrides:
isQueueInitializedin classCrawler
-
beforeCrawlerExecution
protected void beforeCrawlerExecution(boolean resume) - Specified by:
beforeCrawlerExecutionin classCrawler
-
afterCrawlerExecution
protected void afterCrawlerExecution()- Specified by:
afterCrawlerExecutionin classCrawler
-
executeQueuePipeline
- Specified by:
executeQueuePipelinein classCrawler
-
getCrawlDocInfoType
- Overrides:
getCrawlDocInfoTypein classCrawler
-
initCrawlDoc
- Overrides:
initCrawlDocin classCrawler
-
executeImporterPipeline
- Specified by:
executeImporterPipelinein classCrawler
-
createChildDocInfo
- Specified by:
createChildDocInfoin classCrawler
-
executeCommitterPipeline
- Specified by:
executeCommitterPipelinein classCrawler
-
beforeFinalizeDocumentProcessing
- Overrides:
beforeFinalizeDocumentProcessingin classCrawler
-
markReferenceVariationsAsProcessed
- Specified by:
markReferenceVariationsAsProcessedin classCrawler
-