Skip navigation links
A B C D E F G H I J L M N O P Q R S T U V W 

A

AbstractCollector - Class in com.norconex.collector.core
Base implementation of a Collector.
AbstractCollector(ICollectorConfig) - Constructor for class com.norconex.collector.core.AbstractCollector
Creates and configure a Collector with the provided configuration.
AbstractCollectorConfig - Class in com.norconex.collector.core
Base Collector configuration.
AbstractCollectorConfig() - Constructor for class com.norconex.collector.core.AbstractCollectorConfig
 
AbstractCollectorConfig(Class<? extends ICrawlerConfig>) - Constructor for class com.norconex.collector.core.AbstractCollectorConfig
 
AbstractCollectorConfig(String) - Constructor for class com.norconex.collector.core.AbstractCollectorConfig
 
AbstractCollectorConfig(Class<? extends ICrawlerConfig>, String) - Constructor for class com.norconex.collector.core.AbstractCollectorConfig
 
AbstractCollectorLauncher - Class in com.norconex.collector.core
Encapsulates most of the logic for launching a collector implementation from its main method.
AbstractCollectorLauncher() - Constructor for class com.norconex.collector.core.AbstractCollectorLauncher
Constructor.
AbstractCrawlDataStore - Class in com.norconex.collector.core.data.store
Abstract crawl data store.
AbstractCrawlDataStore() - Constructor for class com.norconex.collector.core.data.store.AbstractCrawlDataStore
 
AbstractCrawler - Class in com.norconex.collector.core.crawler
Abstract crawler implementation providing a common base to building crawlers.
AbstractCrawler(ICrawlerConfig) - Constructor for class com.norconex.collector.core.crawler.AbstractCrawler
Constructor.
AbstractCrawler.CopyIfNullBeanUtilsBean - Class in com.norconex.collector.core.crawler
 
AbstractCrawlerConfig - Class in com.norconex.collector.core.crawler
Base Collector configuration.
AbstractCrawlerConfig() - Constructor for class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Creates a new crawler configuration.
AbstractDocumentChecksummer - Class in com.norconex.collector.core.checksum
Abstract implementation of IDocumentChecksummer giving the option to keep the generated checksum in a metadata field.
AbstractDocumentChecksummer() - Constructor for class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
AbstractMetadataChecksummer - Class in com.norconex.collector.core.checksum
Abstract implementation of IMetadataChecksummer giving the option to keep the generated checksum.
AbstractMetadataChecksummer() - Constructor for class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
AbstractMongoCrawlDataStoreFactory - Class in com.norconex.collector.core.data.store.impl.mongo
Mongo implementation of ICrawlDataStore.
AbstractMongoCrawlDataStoreFactory() - Constructor for class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
acceptDocument(ImporterDocument) - Method in interface com.norconex.collector.core.filter.IDocumentFilter
Whether to accept a document.
acceptDocument(ImporterDocument) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
acceptDocument(ImporterDocument) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
acceptDocument(ImporterDocument) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
acceptMetadata(String, Properties) - Method in interface com.norconex.collector.core.filter.IMetadataFilter
Whether to accept the metadata.
acceptMetadata(String, Properties) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
acceptMetadata(String, Properties) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
acceptMetadata(String, Properties) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
acceptReference(String) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
acceptReference(String) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
acceptReference(String) - Method in interface com.norconex.collector.core.filter.IReferenceFilter
Whether to accept this reference.
addMapping(CrawlState, SpoiledReferenceStrategy) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
ALL_FIELDS - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
ARG_ACTION - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_ACTION_CHECKCONFIG - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
Deprecated.
Since 1.8.0, use -k or --checkcfg argument instead.
ARG_ACTION_RESUME - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_ACTION_START - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_ACTION_STOP - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_CHECKCFG - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_CONFIG - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 
ARG_VARIABLES - Static variable in class com.norconex.collector.core.AbstractCollectorLauncher
 

B

BAD_STATUS - Static variable in class com.norconex.collector.core.data.CrawlState
 
BaseCrawlData - Class in com.norconex.collector.core.data
A based implementation of ICrawlData with a default state of NEW.
BaseCrawlData() - Constructor for class com.norconex.collector.core.data.BaseCrawlData
Constructor.
BaseCrawlData(String) - Constructor for class com.norconex.collector.core.data.BaseCrawlData
 
BaseMongoSerializer - Class in com.norconex.collector.core.data.store.impl.mongo
Basic Mongo serializer for BaseCrawlData instances.
BaseMongoSerializer() - Constructor for class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
BasePipelineContext - Class in com.norconex.collector.core.pipeline
Base IPipelineStage context for collector Pipelines.
BasePipelineContext(ICrawler, ICrawlDataStore) - Constructor for class com.norconex.collector.core.pipeline.BasePipelineContext
Constructor.
BasePipelineContext(ICrawler, ICrawlDataStore, BaseCrawlData) - Constructor for class com.norconex.collector.core.pipeline.BasePipelineContext
Constructor.
BasicJDBCCrawlDataStoreFactory - Class in com.norconex.collector.core.data.store.impl.jdbc
JDBC implementation of ICrawlDataStore using H2 database.
BasicJDBCCrawlDataStoreFactory() - Constructor for class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
BasicJDBCSerializer - Class in com.norconex.collector.core.data.store.impl.jdbc
Basic JDBC serializer for storing and retrieving BaseCrawlData instances.
BasicJDBCSerializer() - Constructor for class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
beforeFinalizeDocumentProcessing(BaseCrawlData, ICrawlDataStore, ImporterDocument, ICrawlData) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
Gives implementors a change to take action on a document before its processing is being finalized (cycle end-of-life for a crawled reference).
buildMongoClient(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Builds a MongoClient object based on these connection details.
buildMongoClient(String, MongoConnectionDetails) - Static method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
buildMongoCredential(String, String, char[], String) - Static method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Builds a MongoCredential object based on these connection details.

C

checksumMD5(InputStream) - Static method in class com.norconex.collector.core.checksum.ChecksumUtil
 
checksumMD5(String) - Static method in class com.norconex.collector.core.checksum.ChecksumUtil
 
ChecksumStageUtil - Class in com.norconex.collector.core.pipeline
Checksum stage utility methods.
ChecksumUtil - Class in com.norconex.collector.core.checksum
Checksum utility methods.
cleanupExecution(JobStatusUpdater, JobSuite, ICrawlDataStore) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
clone() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
clone() - Method in interface com.norconex.collector.core.data.ICrawlData
Clones this reference.
close() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Closes a database connection.
close() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
close() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
close() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
COLLECTOR_CHECKSUM_DOC - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
 
COLLECTOR_CHECKSUM_METADATA - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
 
COLLECTOR_CONTENT_ENCODING - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
 
COLLECTOR_CONTENT_TYPE - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
 
COLLECTOR_IS_CRAWL_NEW - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
Boolean flag indicating whether a document is new to the crawler that fetched it.
COLLECTOR_PREFIX - Static variable in class com.norconex.collector.core.doc.CollectorMetadata
 
CollectorConfigLoader - Class in com.norconex.collector.core
Collector configuration loader.
CollectorConfigLoader(Class<? extends ICollectorConfig>) - Constructor for class com.norconex.collector.core.CollectorConfigLoader
 
CollectorException - Exception in com.norconex.collector.core
Runtime exception for most unrecoverable issues thrown by Collector classes.
CollectorException() - Constructor for exception com.norconex.collector.core.CollectorException
 
CollectorException(String) - Constructor for exception com.norconex.collector.core.CollectorException
 
CollectorException(Throwable) - Constructor for exception com.norconex.collector.core.CollectorException
 
CollectorException(String, Throwable) - Constructor for exception com.norconex.collector.core.CollectorException
 
CollectorMetadata - Class in com.norconex.collector.core.doc
Collector metadata with constants for common metadata field names.
CollectorMetadata() - Constructor for class com.norconex.collector.core.doc.CollectorMetadata
 
CollectorMetadata(Properties) - Constructor for class com.norconex.collector.core.doc.CollectorMetadata
 
com.norconex.collector.core - package com.norconex.collector.core
 
com.norconex.collector.core.checksum - package com.norconex.collector.core.checksum
 
com.norconex.collector.core.checksum.impl - package com.norconex.collector.core.checksum.impl
 
com.norconex.collector.core.crawler - package com.norconex.collector.core.crawler
 
com.norconex.collector.core.crawler.event - package com.norconex.collector.core.crawler.event
 
com.norconex.collector.core.data - package com.norconex.collector.core.data
 
com.norconex.collector.core.data.store - package com.norconex.collector.core.data.store
 
com.norconex.collector.core.data.store.impl.jdbc - package com.norconex.collector.core.data.store.impl.jdbc
 
com.norconex.collector.core.data.store.impl.mongo - package com.norconex.collector.core.data.store.impl.mongo
 
com.norconex.collector.core.data.store.impl.mvstore - package com.norconex.collector.core.data.store.impl.mvstore
 
com.norconex.collector.core.doc - package com.norconex.collector.core.doc
 
com.norconex.collector.core.filter - package com.norconex.collector.core.filter
 
com.norconex.collector.core.filter.impl - package com.norconex.collector.core.filter.impl
 
com.norconex.collector.core.jmx - package com.norconex.collector.core.jmx
 
com.norconex.collector.core.pipeline - package com.norconex.collector.core.pipeline
 
com.norconex.collector.core.pipeline.committer - package com.norconex.collector.core.pipeline.committer
 
com.norconex.collector.core.pipeline.importer - package com.norconex.collector.core.pipeline.importer
 
com.norconex.collector.core.pipeline.queue - package com.norconex.collector.core.pipeline.queue
 
com.norconex.collector.core.spoil - package com.norconex.collector.core.spoil
 
com.norconex.collector.core.spoil.impl - package com.norconex.collector.core.spoil.impl
 
CommitModuleStage - Class in com.norconex.collector.core.pipeline.committer
Common pipeline stage for committing documents.
CommitModuleStage() - Constructor for class com.norconex.collector.core.pipeline.committer.CommitModuleStage
 
CopyIfNullBeanUtilsBean() - Constructor for class com.norconex.collector.core.crawler.AbstractCrawler.CopyIfNullBeanUtilsBean
 
copyProperty(Object, String, Object) - Method in class com.norconex.collector.core.crawler.AbstractCrawler.CopyIfNullBeanUtilsBean
 
CrawlDataStoreException - Exception in com.norconex.collector.core.data.store
Crawl data store runtime exception.
CrawlDataStoreException() - Constructor for exception com.norconex.collector.core.data.store.CrawlDataStoreException
 
CrawlDataStoreException(String) - Constructor for exception com.norconex.collector.core.data.store.CrawlDataStoreException
 
CrawlDataStoreException(Throwable) - Constructor for exception com.norconex.collector.core.data.store.CrawlDataStoreException
 
CrawlDataStoreException(String, Throwable) - Constructor for exception com.norconex.collector.core.data.store.CrawlDataStoreException
 
CRAWLER_FINISHED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
The crawler completed execution (without being stopped).
CRAWLER_RESUMED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
The crawler resumed execution (from a previous incomplete crawl).
CRAWLER_STARTED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
The crawler started.
CRAWLER_STOPPED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
Issued when a request to stop the crawler has been fully executed (crawler stopped).
CRAWLER_STOPPING - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
Issued when a request to stop the crawler has been received.
CrawlerConfigLoader - Class in com.norconex.collector.core.crawler
HTTP Crawler configuration loader.
CrawlerConfigLoader(Class<? extends ICrawlerConfig>) - Constructor for class com.norconex.collector.core.crawler.CrawlerConfigLoader
 
CrawlerEvent - Class in com.norconex.collector.core.crawler.event
A crawler event.
CrawlerEvent(String, ICrawlData, Object) - Constructor for class com.norconex.collector.core.crawler.event.CrawlerEvent
 
crawlerEvent(ICrawler, CrawlerEvent) - Method in interface com.norconex.collector.core.crawler.event.ICrawlerEventListener
Fired when a crawler event occurs.
CrawlerEventManager - Class in com.norconex.collector.core.crawler.event
Manage event listeners and log events.
CrawlerEventManager(ICrawler, ICrawlerEventListener[]) - Constructor for class com.norconex.collector.core.crawler.event.CrawlerEventManager
 
CrawlState - Class in com.norconex.collector.core.data
Reference processing status.
CrawlState(String) - Constructor for class com.norconex.collector.core.data.CrawlState
Constructor.
createCollector(ICollectorConfig) - Method in class com.norconex.collector.core.AbstractCollectorLauncher
 
createCrawlDataStore(boolean) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
createCrawlDataStore(ICrawlerConfig, boolean) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStoreFactory
Creates a new crawl data store.
createCrawlDataStore(ICrawlerConfig, boolean) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
createCrawlDataStore(ICrawlerConfig, boolean) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
createCrawlDataStore(ICrawlerConfig, boolean) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
createCrawler(ICrawlerConfig) - Method in class com.norconex.collector.core.AbstractCollector
Creates a new crawler instance.
createDocumentChecksum(ImporterDocument) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
createDocumentChecksum(ImporterDocument) - Method in interface com.norconex.collector.core.checksum.IDocumentChecksummer
Creates a document checksum.
createEmbeddedCrawlData(String, ICrawlData) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
createIndices(MongoCollection<Document>, MongoCollection<Document>) - Method in class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
createIndices(MongoCollection<Document>, MongoCollection<Document>) - Method in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
Creates Mongo indices for the given collections.
createJDBCSerializer() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
createJobSuite() - Method in class com.norconex.collector.core.AbstractCollector
 
createMetadataChecksum(Properties) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
createMetadataChecksum(Properties) - Method in interface com.norconex.collector.core.checksum.IMetadataChecksummer
Creates a metadata checksum.
createMongoSerializer() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 

D

DEFAULT_CACHED_COL_NAME - Static variable in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
DEFAULT_FALLBACK_STRATEGY - Static variable in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
DEFAULT_LOGS_DIR - Static variable in class com.norconex.collector.core.AbstractCollectorConfig
Default relative directory where logs from Log4j are stored.
DEFAULT_PROGRESS_DIR - Static variable in class com.norconex.collector.core.AbstractCollectorConfig
Default relative directory where progress files are stored.
DEFAULT_REFERENCES_COL_NAME - Static variable in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
defaultIfEmpty(T[], T[]) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
defaultIfEmpty(T[], T[]) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
deleteCacheOrphans(ICrawlDataStore, JobStatusUpdater, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
DELETED - Static variable in class com.norconex.collector.core.data.CrawlState
 
deleteReferences(String...) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
doCreateDocumentChecksum(ImporterDocument) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
doCreateDocumentChecksum(ImporterDocument) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
doCreateMetaChecksum(Properties) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
doCreateMetaChecksum(Properties) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
DOCUMENT_COMMITTED_ADD - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_COMMITTED_REMOVE - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_FETCHED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_IMPORTED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_METADATA_FETCHED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_POSTIMPORTED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_PREIMPORTED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DOCUMENT_SAVED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
DocumentChecksumStage - Class in com.norconex.collector.core.pipeline.committer
Common pipeline stage for creating a document checksum.
DocumentChecksumStage() - Constructor for class com.norconex.collector.core.pipeline.committer.DocumentChecksumStage
 
DocumentFiltersStage - Class in com.norconex.collector.core.pipeline.importer
 
DocumentFiltersStage() - Constructor for class com.norconex.collector.core.pipeline.importer.DocumentFiltersStage
 
DocumentPipelineContext - Class in com.norconex.collector.core.pipeline
IPipelineStage context for collector Pipelines dealing with an ImporterDocument.
DocumentPipelineContext(ICrawler, ICrawlDataStore) - Constructor for class com.norconex.collector.core.pipeline.DocumentPipelineContext
Constructor.
DocumentPipelineContext(ICrawler, ICrawlDataStore, BaseCrawlData) - Constructor for class com.norconex.collector.core.pipeline.DocumentPipelineContext
Constructor.
DocumentPipelineContext(ICrawler, ICrawlDataStore, BaseCrawlData, BaseCrawlData, ImporterDocument) - Constructor for class com.norconex.collector.core.pipeline.DocumentPipelineContext
 

E

ensureIndex(MongoCollection<Document>, boolean, String...) - Method in class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
equals(Object) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
equals(Object) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
equals(Object) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
equals(Object) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
equals(Object) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
equals(Object) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
equals(Object) - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
equals(Object) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
equals(Object) - Method in class com.norconex.collector.core.data.CrawlState
 
equals(Object) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
equals(Object) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
equals(Object) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
equals(Object) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
equals(Object) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
equals(Object) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
equals(Object) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
equals(Object) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
equals(Object) - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
equals(Object) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
ERROR - Static variable in class com.norconex.collector.core.data.CrawlState
 
execute(JobStatusUpdater, JobSuite, ICrawlDataStore) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
execute(DocumentPipelineContext) - Method in class com.norconex.collector.core.pipeline.committer.CommitModuleStage
 
execute(DocumentPipelineContext) - Method in class com.norconex.collector.core.pipeline.committer.DocumentChecksumStage
 
execute(ImporterPipelineContext) - Method in class com.norconex.collector.core.pipeline.importer.DocumentFiltersStage
 
execute(ImporterPipelineContext) - Method in class com.norconex.collector.core.pipeline.importer.ImportModuleStage
 
execute(ImporterPipelineContext) - Method in class com.norconex.collector.core.pipeline.importer.SaveDocumentStage
 
execute(BasePipelineContext) - Method in class com.norconex.collector.core.pipeline.queue.QueueReferenceStage
 
execute(BasePipelineContext) - Method in class com.norconex.collector.core.pipeline.queue.ReferenceFiltersStage
 
executeCommitterPipeline(ICrawler, ImporterDocument, ICrawlDataStore, BaseCrawlData, BaseCrawlData) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
executeImporterPipeline(ImporterPipelineContext) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
executeQueuePipeline(ICrawlData, ICrawlDataStore) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
ExtensionReferenceFilter - Class in com.norconex.collector.core.filter.impl
Filters a reference based on a comma-separated list of extensions.
ExtensionReferenceFilter() - Constructor for class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
ExtensionReferenceFilter(String) - Constructor for class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
ExtensionReferenceFilter(String, OnMatch) - Constructor for class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
ExtensionReferenceFilter(String, OnMatch, boolean) - Constructor for class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 

F

FIELD_CONTENT_CHECKSUM - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_CONTENT_TYPE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_CRAWL_DATE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_CRAWL_STATE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_DEPTH - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_IS_ROOT_PARENT_REFERENCE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_IS_VALID - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_META_CHECKSUM - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_PARENT_ROOT_REFERENCE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_REFERENCE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_REFERENCE_EXCESSIVE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
FIELD_STAGE - Static variable in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
 
fireCrawlerEvent(String, ICrawlData, Object) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
fireCrawlerEvent(CrawlerEvent) - Method in class com.norconex.collector.core.crawler.event.CrawlerEventManager
 
fireCrawlerEvent(String, ICrawlData, Object) - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
fromDocument(Document) - Method in class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
fromDocument(Document) - Method in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
Converts a Mongo Document to an ICrawlData.

G

GenericMetadataChecksummer - Class in com.norconex.collector.core.checksum.impl
Generic implementation of IMetadataChecksummer that uses specified source field names and their values for the checksum.
GenericMetadataChecksummer() - Constructor for class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
GenericSpoiledReferenceStrategizer - Class in com.norconex.collector.core.spoil.impl
Generic implementation of ISpoiledReferenceStrategizer that offers a simple mapping between the crawl state of references that have turned "bad" and the strategy to adopt for each.
GenericSpoiledReferenceStrategizer() - Constructor for class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
getActiveCount() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets the number of active references (currently being processed).
getActiveCount() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getActiveCount() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getActiveCount() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getAutoCommitBufferSize() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getAutoCommitDelay() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getAutoCompactFillRate() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getBaseDownloadDir() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getCacheConcurrency() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getCached(String) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets the cached reference from previous time crawler was run (e.g.
getCached(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getCached(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getCached(String) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getCachedCollectionName() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
Gets the cached collection name.
getCachedCollectionName() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Gets the cached collection name.
getCachedCrawlData() - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
Gets cached crawl data.
getCachedCrawlDataSQL() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getCachedCrawlDataSQL() - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to obtain all ICrawlData from the cache table.
getCachedCrawlDataValues(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getCachedCrawlDataValues(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the PreparedStatement values (if any) necessary to execute the SQL obtained with IJDBCSerializer.getCachedCrawlDataSQL().
getCacheIterator() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets the cache iterator.
getCacheIterator() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getCacheIterator() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getCacheIterator() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getCacheSize() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getCollectorConfig() - Method in class com.norconex.collector.core.AbstractCollector
Gets the collector configuration
getCollectorConfig() - Method in interface com.norconex.collector.core.ICollector
Gets the collector configuration
getCollectorConfigClass() - Method in class com.norconex.collector.core.AbstractCollectorLauncher
 
getCollectorListeners() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
getCollectorListeners() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets collector life cycle listeners.
getCommitter() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getCommitter() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the Committer module configuration.
getCompress() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getConfig() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
getConnectionDetails() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
getContent() - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
 
getContentChecksum() - Method in class com.norconex.collector.core.data.BaseCrawlData
Gets the content checksum.
getContentChecksum() - Method in interface com.norconex.collector.core.data.ICrawlData
 
getContentReader() - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
 
getContentType() - Method in class com.norconex.collector.core.data.BaseCrawlData
Gets the content type.
getContentType() - Method in interface com.norconex.collector.core.data.ICrawlData
Gets the content type.
getCrawlData() - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
Gets the crawl data holding contextual information about the crawled reference.
getCrawlData() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
getCrawlDataStore() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
getCrawlDataStoreFactory() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getCrawlDataStoreFactory() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the crawl data store factory a crawler should use.
getCrawlDate() - Method in class com.norconex.collector.core.data.BaseCrawlData
Gets the crawl date.
getCrawlDate() - Method in interface com.norconex.collector.core.data.ICrawlData
Gets the crawl date.
getCrawler() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
getCrawlerConfig() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
Gets the crawler configuration
getCrawlerConfig() - Method in interface com.norconex.collector.core.crawler.ICrawler
Gets the crawler configuration
getCrawlerConfigs() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
getCrawlerConfigs() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets all crawler configurations.
getCrawlerDownloadDir() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getCrawlerEventManager() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getCrawlerEventManager() - Method in interface com.norconex.collector.core.crawler.ICrawler
Gets the crawler events manager.
getCrawlerListeners() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getCrawlerListeners() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets crawler event listeners.
getCrawlers() - Method in class com.norconex.collector.core.AbstractCollector
Gets all crawler instances in this collector.
getCreateTableSQLs(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getCreateTableSQLs(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQLs used to create a data store table.
getDatabaseName() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
getDeleteCrawlDataSQL(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getDeleteCrawlDataSQL(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to delete a ICrawlData from the given table.
getDeleteCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getDeleteCrawlDataValues(String, ICrawlData) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the PreparedStatement values (if any) necessary to execute the SQL obtained with IJDBCSerializer.getDeleteCrawlDataSQL(String).
getDocument() - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
 
getDocumentChecksummer() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getDocumentChecksummer() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the document checksummer.
getDocumentFilters() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getDocumentFilters() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the document filters.
getEventType() - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
Gets the event type.
getExtensionParts() - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
getExtensions() - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
getFallbackStrategy() - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
getField() - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
getHost() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
getId() - Method in class com.norconex.collector.core.AbstractCollector
 
getId() - Method in class com.norconex.collector.core.AbstractCollectorConfig
Gets this collector unique identifier.
getId() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getId() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Gets this crawler unique identifier.
getId() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets this crawler unique identifier.
getId() - Method in interface com.norconex.collector.core.ICollector
 
getId() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets this collector unique identifier.
getImporter() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getImporter() - Method in interface com.norconex.collector.core.crawler.ICrawler
Gets the crawler Importer module.
getImporterConfig() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getImporterConfig() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the Importer module configuration.
getImporterResponse() - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
 
getInsertCrawlDataSQL(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getInsertCrawlDataSQL(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to insert a new ICrawlData in the given table.
getInsertCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getInsertCrawlDataValues(String, ICrawlData) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the PreparedStatement values (if any) necessary to execute the SQL obtained with IJDBCSerializer.getInsertCrawlDataSQL(String).
getJobErrorListeners() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
getJobErrorListeners() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets JEF error listeners.
getJobLifeCycleListeners() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
getJobLifeCycleListeners() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets JEF job life cycle listeners.
getJobSuite() - Method in class com.norconex.collector.core.AbstractCollector
Gets the job suite or null if the the collector was not yet started or is no longer running.
getJobSuite() - Method in interface com.norconex.collector.core.ICollector
 
getLogsDir() - Method in class com.norconex.collector.core.AbstractCollectorConfig
Gets the directory location of generated log files.
getLogsDir() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets the directory location of generated log files.
getMaxDocuments() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getMaxDocuments() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the maximum number of documents that can be processed.
getMaxParallelCrawlers() - Method in class com.norconex.collector.core.AbstractCollectorConfig
Gets the maximum number of crawlers that can be executed in parallel at any given time.
getMaxParallelCrawlers() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets the maximum number of crawlers that can be executed in parallel at any given time.
getMechanism() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Gets the authentication mechanism to use (MONGODB-CR, SCRAM-SHA-1 or null to use default).
getMetaChecksum() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
getMetaChecksum() - Method in interface com.norconex.collector.core.data.ICrawlData
 
getMetadataFilters() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getMetadataFilters() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the metadata filters.
getMVStoreConfig() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
getNextQueued(MongoCollection<Document>) - Method in class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
getNextQueued(MongoCollection<Document>) - Method in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
Gets the next queued DB document from the given collection.
getNextQueuedCrawlDataSQL() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getNextQueuedCrawlDataSQL() - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to obtain the next ICrawlData from the queue table.
getNextQueuedCrawlDataValues() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getNextQueuedCrawlDataValues() - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the PreparedStatement values (if any) necessary to execute the SQL obtained with IJDBCSerializer.getNextQueuedCrawlDataSQL().
getNumThreads() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getNumThreads() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the number of threads (maximum) a crawler should use.
getOrphansStrategy() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getOrphansStrategy() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the strategy to adopt when there are orphans.
getPageSplitSize() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
getParentRootReference() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
getParentRootReference() - Method in interface com.norconex.collector.core.data.ICrawlData
 
getPassword() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
getPasswordKey() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Gets the password encryption key.
getPort() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
getProcessed(String) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets an already processed reference from the current crawl session.
getProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getProcessedCount() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets the number of references processed.
getProcessedCount() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getProcessedCount() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getProcessedCount() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getProcessedURLCount() - Method in class com.norconex.collector.core.jmx.Monitoring
 
getProcessedURLCount() - Method in interface com.norconex.collector.core.jmx.MonitoringMBean
 
getProgressDir() - Method in class com.norconex.collector.core.AbstractCollectorConfig
Gets the directory location where progress files (from JEF API) are stored.
getProgressDir() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets the directory location where progress files (from JEF API) are stored.
getQueueSize() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Gets the size of the reference queue (number of references left to process).
getQueueSize() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
getQueueSize() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getQueueSize() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
getReference() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
getReference() - Method in interface com.norconex.collector.core.data.ICrawlData
Gets the unique identifier of this reference (e.g.
getReferenceExistsSQL(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getReferenceExistsSQL(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to find if a ICrawlData exists in the given table.
getReferenceExistsValues(String, String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getReferenceExistsValues(String, String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the PreparedStatement values (if any) necessary to execute the SQL obtained with IJDBCSerializer.getReferenceExistsSQL(String).
getReferenceFilters() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Gets the reference filters
getReferenceFilters() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the reference filters.
getReferencesCollectionName() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
Gets the references collection name.
getReferencesCollectionName() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Gets the references collection name.
getReferencesCount(IMongoSerializer.Stage) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
getRegex() - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
getRegex() - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
getSafeDatabaseName(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Gets a safe database name using MongoUtil, and treating a crawlerId as the default.
getSafeDBName(String, String) - Static method in class com.norconex.collector.core.data.store.impl.mongo.MongoUtil
Return or generate a DB name If a valid dbName is provided, it is returned as is.
getSelectCrawlDataSQL(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
getSelectCrawlDataSQL(String) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Gets the SQL to obtain all ICrawlData entries in the given table.
getSourceFields() - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Gets the metadata fields used to construct a checksum.
getSourceFields() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Gets the fields used to construct a MD5 checksum.
getSourceFieldsRegex() - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Gets the regular expression matching metadata fields used to construct a checksum.
getSourceFieldsRegex() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Gets the regular expression matching metadata fields used to construct a checksum.
getSpoiledReferenceStrategizer() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getSpoiledReferenceStrategizer() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the spoiled state strategy resolver.
getState() - Method in class com.norconex.collector.core.AbstractCollector
Gets the state of this collector.
getState() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
getState() - Method in interface com.norconex.collector.core.data.ICrawlData
Gets this reference state.
getStopOnExceptions() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getStopOnExceptions() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the exceptions we want to stop the crawler on.
getStreamFactory() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
getSubject() - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
Gets the subject of this event.
getSuiteLifeCycleListeners() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
getSuiteLifeCycleListeners() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets JEF job suite life cycle listeners.
getTargetField() - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
Gets the metadata field to use to store the checksum value.
getTargetField() - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
Gets the metadata field to use to store the checksum value.
getURLQueueSize() - Method in class com.norconex.collector.core.jmx.Monitoring
 
getURLQueueSize() - Method in interface com.norconex.collector.core.jmx.MonitoringMBean
 
getUsername() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
getWorkDir() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
getWorkDir() - Method in interface com.norconex.collector.core.crawler.ICrawlerConfig
Gets the crawler working directory where many files created at execution time are stored.

H

handleOrphans(ICrawlDataStore, JobStatusUpdater, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
hashCode() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
hashCode() - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
hashCode() - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
hashCode() - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
hashCode() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
hashCode() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
hashCode() - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
hashCode() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
hashCode() - Method in class com.norconex.collector.core.data.CrawlState
 
hashCode() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
hashCode() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
hashCode() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
hashCode() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
hashCode() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
hashCode() - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
hashCode() - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
hashCode() - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
hashCode() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
hashCode() - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 

I

ICollector - Interface in com.norconex.collector.core
 
ICollectorConfig - Interface in com.norconex.collector.core
 
ICollectorLifeCycleListener - Interface in com.norconex.collector.core
Listens to collector life-cycle events.
ICrawlData - Interface in com.norconex.collector.core.data
A pointer that uniquely identifies a resource being processed (e.g.
ICrawlDataStore - Interface in com.norconex.collector.core.data.store
Holds necessary information about all references (e.g.
ICrawlDataStoreFactory - Interface in com.norconex.collector.core.data.store
Factory responsible for creating new crawl data stores.
ICrawler - Interface in com.norconex.collector.core.crawler
A document crawler.
ICrawlerConfig - Interface in com.norconex.collector.core.crawler
Crawler configuration.
ICrawlerConfig.OrphansStrategy - Enum in com.norconex.collector.core.crawler
 
ICrawlerEventListener - Interface in com.norconex.collector.core.crawler.event
Allows implementers to react to any crawler-specific events.
IDocumentChecksummer - Interface in com.norconex.collector.core.checksum
Creates a checksum representing a a document.
IDocumentFilter - Interface in com.norconex.collector.core.filter
Filter a document after the document content is fetched, downloaded, or otherwise read or acquired.
IJDBCSerializer - Interface in com.norconex.collector.core.data.store.impl.jdbc
Serializer holding necessary information to insert, load, delete and create document reference information specific to each database tables.
IMetadataChecksummer - Interface in com.norconex.collector.core.checksum
Creates a checksum representing a document based on document metadata values obtained prior to fetching that document (e.g.
IMetadataFilter - Interface in com.norconex.collector.core.filter
Filter a reference based on the metadata that could be obtained for a document, before it was fetched, downloaded, or otherwise read or acquired (e.g.
IMongoSerializer - Interface in com.norconex.collector.core.data.store.impl.mongo
Mongo serializer.
IMongoSerializer.Stage - Enum in com.norconex.collector.core.data.store.impl.mongo
 
ImporterPipelineContext - Class in com.norconex.collector.core.pipeline.importer
IPipelineStage context for collector Pipelines dealing with ImporterResponse.
ImporterPipelineContext(ImporterPipelineContext) - Constructor for class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Constructor creating a copy of supplied context.
ImporterPipelineContext(ICrawler, ICrawlDataStore) - Constructor for class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Constructor.
ImporterPipelineContext(ICrawler, ICrawlDataStore, BaseCrawlData) - Constructor for class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Constructor.
ImporterPipelineContext(ICrawler, ICrawlDataStore, BaseCrawlData, BaseCrawlData, ImporterDocument) - Constructor for class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
 
ImporterPipelineUtil - Class in com.norconex.collector.core.pipeline.importer
 
ImportModuleStage - Class in com.norconex.collector.core.pipeline.importer
Common pipeline stage for importing documents.
ImportModuleStage() - Constructor for class com.norconex.collector.core.pipeline.importer.ImportModuleStage
 
initCrawlData(ICrawlData, ICrawlData, ImporterDocument) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
IReferenceFilter - Interface in com.norconex.collector.core.filter
Filter a document based on its reference, before its properties or content gets read or otherwise acquired.
isActive(String) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Whether the given reference is currently being processed (i.e.
isActive(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
isActive(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isActive(String) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
isCacheEmpty() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Whether there are any references the the cache from a previous crawler run.
isCacheEmpty() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
isCacheEmpty() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isCacheEmpty() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
isCaseSensitive() - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
isCaseSensitive() - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
isCaseSensitive() - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
isCombineFieldsAndContent() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Gets whether we are combining the fields and content checksums.
isDelete() - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Gets whether the document should be deleted.
isDisabled() - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Whether this checksummer is disabled or not.
isDisabled() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Whether this checksummer is disabled or not.
isGoodState() - Method in class com.norconex.collector.core.data.CrawlState
Returns whether a reference should be considered "good" (the corresponding document is not in a "bad" state, such as being rejected or produced an error.
isHeadersRejected(ImporterPipelineContext) - Static method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineUtil
 
isKeep() - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
Whether to keep the document checksum value as a new field in the document metadata.
isKeep() - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
Whether to keep the metadata checksum value as a new metadata field.
isLogsUnmanaged() - Method in class com.norconex.collector.core.AbstractCollectorConfig
Gets whether written logs are managed by the collector.
isLogsUnmanaged() - Method in interface com.norconex.collector.core.ICollectorConfig
Gets whether written logs are managed by the collector.
isMaxDocuments() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
isNewOrModified() - Method in class com.norconex.collector.core.data.CrawlState
Returns whether a state indicates new or modified.
isOneOf(CrawlState...) - Method in class com.norconex.collector.core.data.CrawlState
 
isOrphan() - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Gets whether the document is an orphan (no longer referenced).
ISpoiledReferenceStrategizer - Interface in com.norconex.collector.core.spoil
Decides which strategy to adopt for a given reference with a bad state.
isProcessed(String) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Whether the given reference has been processed.
isProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
isProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isProcessed(String) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
isQueued(String) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Whether the given reference is in the queue or not (waiting to be processed).
isQueued(String) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
isQueued(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isQueued(String) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
isQueueEmpty() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Whether there are any references to process in the queue.
isQueueEmpty() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
isQueueEmpty() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isQueueEmpty() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
isRootParentReference() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
isRootParentReference() - Method in interface com.norconex.collector.core.data.ICrawlData
 
isSkipped() - Method in class com.norconex.collector.core.data.CrawlState
Returns whether a state indicate the document is to be skipped (CrawlState.UNMODIFIED or CrawlState.PREMATURE).
isSslEnabled() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Gets whether to use SSL.
isSslInvalidHostNameAllowed() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Gets whether invalid host names should be allowed if SSL is enabled.
isStage(String, IMongoSerializer.Stage) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
isStopped() - Method in class com.norconex.collector.core.crawler.AbstractCrawler
Whether the crawler job was stopped.

J

JDBCCrawlDataStore - Class in com.norconex.collector.core.data.store.impl.jdbc
 
JDBCCrawlDataStore(String, boolean, IJDBCSerializer) - Constructor for class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 

L

launch(String[]) - Method in class com.norconex.collector.core.AbstractCollectorLauncher
 
loadChecksummerFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
loadChecksummerFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
loadChecksummerFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
loadChecksummerFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
loadCollectorConfig(File) - Method in class com.norconex.collector.core.CollectorConfigLoader
Loads a collection configuration from file.
loadCollectorConfig(File, File) - Method in class com.norconex.collector.core.CollectorConfigLoader
Loads a collection configuration from file.
loadCollectorConfigFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
loadCrawlerConfig(ICrawlerConfig, XMLConfiguration) - Method in class com.norconex.collector.core.crawler.CrawlerConfigLoader
Loads a crawler configuration, which can be either the default crawler or real crawler configuration instances (keeping defaults).
loadCrawlerConfigFromXML(XMLConfiguration) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
loadCrawlerConfigs(File) - Method in class com.norconex.collector.core.crawler.CrawlerConfigLoader
 
loadCrawlerConfigs(File, File) - Method in class com.norconex.collector.core.crawler.CrawlerConfigLoader
 
loadCrawlerConfigs(HierarchicalConfiguration) - Method in class com.norconex.collector.core.crawler.CrawlerConfigLoader
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
loadFromXML(Reader) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 

M

markReferenceVariationsAsProcessed(BaseCrawlData, ICrawlDataStore) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
MD5DocumentChecksummer - Class in com.norconex.collector.core.checksum.impl
Implementation of IDocumentChecksummer which returns a MD5 checksum value of the extracted document content unless one or more given source fields are specified, in which case the MD5 checksum value is constructed from those fields.
MD5DocumentChecksummer() - Constructor for class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
metadataChecksumMD5(Properties, String, String...) - Static method in class com.norconex.collector.core.checksum.ChecksumUtil
 
metadataChecksumPlain(Properties, String, String...) - Static method in class com.norconex.collector.core.checksum.ChecksumUtil
 
MODIFIED - Static variable in class com.norconex.collector.core.data.CrawlState
 
MONGO_INVALID_DBNAME_CHARACTERS - Static variable in class com.norconex.collector.core.data.store.impl.mongo.MongoUtil
 
MongoConnectionDetails - Class in com.norconex.collector.core.data.store.impl.mongo
Hold Mongo connection details.
MongoConnectionDetails() - Constructor for class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
MongoCrawlDataStore - Class in com.norconex.collector.core.data.store.impl.mongo
Mongo implementation of ICrawlDataStore.
MongoCrawlDataStore(String, boolean, MongoConnectionDetails, IMongoSerializer) - Constructor for class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Constructor.
MongoCrawlDataStore(String, boolean, MongoConnectionDetails, IMongoSerializer, String, String) - Constructor for class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Constructor.
MongoCrawlDataStore(boolean, MongoClient, String, IMongoSerializer) - Constructor for class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Constructor.
MongoCrawlDataStore(boolean, MongoClient, String, IMongoSerializer, String, String) - Constructor for class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
Constructor.
MongoUtil - Class in com.norconex.collector.core.data.store.impl.mongo
Utility method for Mongo operations.
Monitoring - Class in com.norconex.collector.core.jmx
 
Monitoring(ICrawlDataStore) - Constructor for class com.norconex.collector.core.jmx.Monitoring
 
MonitoringMBean - Interface in com.norconex.collector.core.jmx
 
MVStoreConfig - Class in com.norconex.collector.core.data.store.impl.mvstore
MVStore configuration parameters.
MVStoreConfig() - Constructor for class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
MVStoreCrawlDataStore - Class in com.norconex.collector.core.data.store.impl.mvstore
H2 MVStore ICrawlDataStore implementation.
MVStoreCrawlDataStore(String, boolean) - Constructor for class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
MVStoreCrawlDataStore(String, boolean, MVStoreConfig) - Constructor for class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
MVStoreCrawlDataStoreFactory - Class in com.norconex.collector.core.data.store.impl.mvstore
H2 MVStore crawl data store factory (http://h2database.com/html/mvstore.html).
MVStoreCrawlDataStoreFactory() - Constructor for class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 

N

NEW - Static variable in class com.norconex.collector.core.data.CrawlState
 
nextQueued() - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Returns the next reference to be processed from the queue and marks it as being "active" (i.e.
nextQueued() - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
nextQueued() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
nextQueued() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
NOT_FOUND - Static variable in class com.norconex.collector.core.data.CrawlState
 

O

onCollectorFinish(ICollector) - Method in interface com.norconex.collector.core.ICollectorLifeCycleListener
Invoked when the collector is finishing its execution.
onCollectorStart(ICollector) - Method in interface com.norconex.collector.core.ICollectorLifeCycleListener
Invoked when the collector has been created and is just about to start.

P

parseCommandLineArguments(String[]) - Method in class com.norconex.collector.core.AbstractCollectorLauncher
 
PREMATURE - Static variable in class com.norconex.collector.core.data.CrawlState
For collectors that support it, this state indicates a previously crawled document is not yet ready to be re-crawled.
prepareExecution(JobStatusUpdater, JobSuite, ICrawlDataStore, boolean) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
processed(ICrawlData) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Marks this reference as processed.
processed(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
processed(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
processed(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
processNextReference(JobStatusUpdater, ImporterPipelineContext) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
processReferences(JobStatusUpdater, JobSuite, ImporterPipelineContext) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 

Q

queue(ICrawlData) - Method in interface com.norconex.collector.core.data.store.ICrawlDataStore
Queues a reference for future processing.
queue(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
queue(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore
 
queue(ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore
 
QueueReferenceStage - Class in com.norconex.collector.core.pipeline.queue
Common pipeline stage for queuing documents.
QueueReferenceStage() - Constructor for class com.norconex.collector.core.pipeline.queue.QueueReferenceStage
Constructor.

R

ReferenceFiltersStage - Class in com.norconex.collector.core.pipeline.queue
Common pipeline stage for filtering references.
ReferenceFiltersStage() - Constructor for class com.norconex.collector.core.pipeline.queue.ReferenceFiltersStage
 
ReferenceFiltersStage(String) - Constructor for class com.norconex.collector.core.pipeline.queue.ReferenceFiltersStage
 
ReferenceFiltersStageUtil - Class in com.norconex.collector.core.pipeline.queue
Reference-filtering stage utility methods.
RegexMetadataFilter - Class in com.norconex.collector.core.filter.impl
Accepts or rejects a reference using regular expression to match a metadata field value.
RegexMetadataFilter() - Constructor for class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
RegexMetadataFilter(String, String) - Constructor for class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
RegexMetadataFilter(String, String, OnMatch) - Constructor for class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
RegexMetadataFilter(String, String, OnMatch, boolean) - Constructor for class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
RegexReferenceFilter - Class in com.norconex.collector.core.filter.impl
Filters URL based on a regular expression.
RegexReferenceFilter() - Constructor for class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
RegexReferenceFilter(String) - Constructor for class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
RegexReferenceFilter(String, OnMatch) - Constructor for class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
RegexReferenceFilter(String, OnMatch, boolean) - Constructor for class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
REJECTED - Static variable in class com.norconex.collector.core.data.CrawlState
 
REJECTED_BAD_STATUS - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
REJECTED_ERROR - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
REJECTED_FILTER - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
REJECTED_IMPORT - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
REJECTED_NOTFOUND - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
REJECTED_PREMATURE - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
A document could not be re-crawled because it is not yet ready to be re-crawled.
REJECTED_UNMODIFIED - Static variable in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
reprocessCacheOrphans(ICrawlDataStore, JobStatusUpdater, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
resolveDocumentChecksum(String, DocumentPipelineContext, Object) - Static method in class com.norconex.collector.core.pipeline.ChecksumStageUtil
 
resolveMetaChecksum(String, DocumentPipelineContext, Object) - Static method in class com.norconex.collector.core.pipeline.ChecksumStageUtil
 
resolveReferenceFilters(IReferenceFilter[], BasePipelineContext, String) - Static method in class com.norconex.collector.core.pipeline.queue.ReferenceFiltersStageUtil
 
resolveSpoiledReferenceStrategy(String, CrawlState) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
resolveSpoiledReferenceStrategy(String, CrawlState) - Method in interface com.norconex.collector.core.spoil.ISpoiledReferenceStrategizer
Establish which spoiled reference strategy to adopt.
resumeExecution(JobStatusUpdater, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 

S

saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
saveCollectorConfigToXML(Writer) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
saveCrawlerConfigToXML(Writer) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
SaveDocumentStage - Class in com.norconex.collector.core.pipeline.importer
Common pipeline stage for saving documents.
SaveDocumentStage() - Constructor for class com.norconex.collector.core.pipeline.importer.SaveDocumentStage
 
saveToXML(Writer) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
saveToXML(Writer) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
saveToXML(Writer) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
saveToXML(Writer) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
saveToXML(Writer) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
saveToXML(Writer) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
saveToXML(Writer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
saveToXML(Writer) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
saveToXML(Writer) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
saveToXML(Writer) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
saveToXML(Writer) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
setAutoCommitBufferSize(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setAutoCommitDelay(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setAutoCompactFillRate(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setCacheConcurrency(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setCachedCollectionName(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
Sets the cached collection name.
setCachedCrawlData(BaseCrawlData) - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
Sets cached crawl data.
setCacheSize(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setCaseSensitive(boolean) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
setCaseSensitive(boolean) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
setCaseSensitive(boolean) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
setCollectorListeners(ICollectorLifeCycleListener...) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets collector life cycle listeners.
setCombineFieldsAndContent(boolean) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Sets whether to combine the fields and content checksums.
setCommitter(ICommitter) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setCompress(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setContentChecksum(String) - Method in class com.norconex.collector.core.data.BaseCrawlData
Sets the content checksum.
setContentType(ContentType) - Method in class com.norconex.collector.core.data.BaseCrawlData
Sets the content type.
setCrawlData(BaseCrawlData) - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
Sets the current crawl data.
setCrawlDataStoreFactory(ICrawlDataStoreFactory) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setCrawlDate(Date) - Method in class com.norconex.collector.core.data.BaseCrawlData
Sets the crawl date.
setCrawlerConfigs(ICrawlerConfig...) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets crawler configurations.
setCrawlerListeners(ICrawlerEventListener...) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setCrawlers(ICrawler[]) - Method in class com.norconex.collector.core.AbstractCollector
Add the provided crawlers to this collector.
setDatabaseName(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
setDelete(boolean) - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Sets whether the document should be deleted.
setDisabled(boolean) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Sets whether this checksummer is disabled or not.
setDisabled(boolean) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Sets whether this checksummer is disabled or not.
setDocument(ImporterDocument) - Method in class com.norconex.collector.core.pipeline.DocumentPipelineContext
Sets document.
setDocumentChecksummer(IDocumentChecksummer) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setDocumentFilters(IDocumentFilter...) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setExtensions(String) - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
setFallbackStrategy(SpoiledReferenceStrategy) - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 
setField(String) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
setHost(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
setId(String) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets this collector unique identifier.
setId(String) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Sets this crawler unique identifier.
setImporterConfig(ImporterConfig) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setImporterResponse(ImporterResponse) - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
 
setJobErrorListeners(IJobErrorListener...) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets JEF error listeners.
setJobLifeCycleListeners(IJobLifeCycleListener...) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets JEF job life cycle listeners.
setKeep(boolean) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
Sets whether to keep the document checksum value as a new field in the document metadata.
setKeep(boolean) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
Sets whether to keep the metadata checksum value as a new metadata field.
setLogsDir(String) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets the directory location of generated log files.
setLogsUnmanaged(boolean) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets whether written logs are managed by the collector.
setMaxDocuments(int) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setMaxParallelCrawlers(int) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets the maximum number of crawlers that can be executed in parallel at any given time.
setMechanism(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Sets the authentication mechanism to use (MONGODB-CR, SCRAM-SHA-1 or null to use default).
setMetaChecksum(String) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
setMetadataFilters(IMetadataFilter...) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setNumThreads(int) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setOrphan(boolean) - Method in class com.norconex.collector.core.pipeline.importer.ImporterPipelineContext
Sets whether the document is an orphan (no longer referenced).
setOrphansStrategy(ICrawlerConfig.OrphansStrategy) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setPageSplitSize(Integer) - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
setParentRootReference(String) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
setPassword(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
setPasswordKey(EncryptionKey) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Sets the password encryption key.
setPort(int) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
setProgressDir(String) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets the directory location where progress files (from JEF API) are stored.
setReference(String) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
setReferenceFilters(IReferenceFilter...) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Sets the reference filters.
setReferencesCollectionName(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
Sets the references collection name.
setRegex(String) - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
setRegex(String) - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
setRootParentReference(boolean) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
setSourceFields(String...) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Sets the metadata header fields used construct a checksum.
setSourceFields(String...) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Sets the fields used to construct a MD5 checksum.
setSourceFieldsRegex(String) - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
Sets the regular expression matching metadata fields used construct a checksum.
setSourceFieldsRegex(String) - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
Sets the regular expression matching metadata fields used construct a checksum.
setSpoiledReferenceStrategizer(ISpoiledReferenceStrategizer) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
setSslEnabled(boolean) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Sets whether to use SSL.
setSslInvalidHostNameAllowed(boolean) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
Sets whether invalid host names should be allowed if SSL is enabled.
setState(CrawlState) - Method in class com.norconex.collector.core.data.BaseCrawlData
 
setStopOnExceptions(Class<? extends Exception>...) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
Sets the exceptions we want to stop the crawler on.
setSuiteLifeCycleListeners(ISuiteLifeCycleListener...) - Method in class com.norconex.collector.core.AbstractCollectorConfig
Sets JEF job suite life cycle listeners.
setTargetField(String) - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
Sets the metadata field name to use to store the checksum value.
setTargetField(String) - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
Sets the metadata field name to use to store the checksum value.
setUsername(String) - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
setWorkDir(File) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
SpoiledReferenceStrategy - Enum in com.norconex.collector.core.spoil
Markers indicating what to do with references that were once processed properly, but failed to get a good processing state a subsequent time around.
start(boolean) - Method in class com.norconex.collector.core.AbstractCollector
Start all crawlers defined in configuration.
start(boolean) - Method in interface com.norconex.collector.core.ICollector
Launched all crawlers defined in configuration.
startExecution(JobStatusUpdater, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
stop() - Method in class com.norconex.collector.core.AbstractCollector
Stops a running instance of this Collector.
stop(IJobStatus, JobSuite) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
stop() - Method in interface com.norconex.collector.core.ICollector
Stops a running instance of this Collector.

T

TABLE_ACTIVE - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
TABLE_CACHE - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
TABLE_PROCESSED_INVALID - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
TABLE_PROCESSED_VALID - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
TABLE_QUEUE - Static variable in class com.norconex.collector.core.data.store.impl.jdbc.JDBCCrawlDataStore
 
toCrawlData(String, ResultSet) - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCSerializer
 
toCrawlData(String, ResultSet) - Method in interface com.norconex.collector.core.data.store.impl.jdbc.IJDBCSerializer
Convert a database entry to a ICrawlData instance.
toDocument(IMongoSerializer.Stage, ICrawlData) - Method in class com.norconex.collector.core.data.store.impl.mongo.BaseMongoSerializer
 
toDocument(IMongoSerializer.Stage, ICrawlData) - Method in interface com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer
Converts a ICrawlData to a Mongo Document.
toString() - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
toString() - Method in class com.norconex.collector.core.checksum.AbstractDocumentChecksummer
 
toString() - Method in class com.norconex.collector.core.checksum.AbstractMetadataChecksummer
 
toString() - Method in class com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer
 
toString() - Method in class com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer
 
toString() - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
toString() - Method in class com.norconex.collector.core.crawler.event.CrawlerEvent
 
toString() - Method in class com.norconex.collector.core.data.BaseCrawlData
 
toString() - Method in class com.norconex.collector.core.data.CrawlState
 
toString() - Method in class com.norconex.collector.core.data.store.impl.jdbc.BasicJDBCCrawlDataStoreFactory
 
toString() - Method in class com.norconex.collector.core.data.store.impl.mongo.AbstractMongoCrawlDataStoreFactory
 
toString() - Method in class com.norconex.collector.core.data.store.impl.mongo.MongoConnectionDetails
 
toString() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreConfig
 
toString() - Method in class com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory
 
toString() - Method in class com.norconex.collector.core.filter.impl.ExtensionReferenceFilter
 
toString() - Method in class com.norconex.collector.core.filter.impl.RegexMetadataFilter
 
toString() - Method in class com.norconex.collector.core.filter.impl.RegexReferenceFilter
 
toString() - Method in class com.norconex.collector.core.pipeline.BasePipelineContext
 
toString() - Method in class com.norconex.collector.core.spoil.impl.GenericSpoiledReferenceStrategizer
 

U

UNMODIFIED - Static variable in class com.norconex.collector.core.data.CrawlState
 
urlToPath(String) - Static method in class com.norconex.collector.core.pipeline.importer.SaveDocumentStage
 

V

valueOf(String) - Static method in enum com.norconex.collector.core.crawler.ICrawlerConfig.OrphansStrategy
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in class com.norconex.collector.core.data.CrawlState
 
valueOf(String) - Static method in enum com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer.Stage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.core.spoil.SpoiledReferenceStrategy
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.norconex.collector.core.crawler.ICrawlerConfig.OrphansStrategy
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.core.data.store.impl.mongo.IMongoSerializer.Stage
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.core.spoil.SpoiledReferenceStrategy
Returns an array containing the constants of this enum type, in the order they are declared.

W

wrapDocument(ICrawlData, ImporterDocument) - Method in class com.norconex.collector.core.crawler.AbstractCrawler
 
writeArray(Writer, String, String, Object[]) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
writeArray(Writer, String, String, Object[]) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
writeObject(Writer, String, Object) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
writeObject(Writer, String, Object, boolean) - Method in class com.norconex.collector.core.AbstractCollectorConfig
 
writeObject(Writer, String, Object) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
writeObject(Writer, String, Object, boolean) - Method in class com.norconex.collector.core.crawler.AbstractCrawlerConfig
 
A B C D E F G H I J L M N O P Q R S T U V W 
Skip navigation links

Copyright © 2014–2021 Norconex Inc.. All rights reserved.