All Classes (Norconex Collector Core 2.1.0 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class	Description
AbstractDocumentChecksummer	Abstract implementation of `IDocumentChecksummer` giving the option to keep the generated checksum in a metadata field.
AbstractMetadataChecksummer	Abstract implementation of `IMetadataChecksummer` giving the option to keep the generated checksum.
AbstractPipelineContext	Base `IPipelineStage` context for collector `Pipeline`s.
AbstractSubCommand	Base class for subcommands.
ChecksumStageUtil	Checksum stage utility methods.
ChecksumUtil	Checksum utility methods.
CleanCommand	Clean the Collector crawling history.
Collector	Base implementation of a Collector.
CollectorCommand	Encapsulates command line arguments when running the Collector from a command prompt.
CollectorCommandLauncher	Launches a collector implementation from a string array representing command line arguments.
CollectorConfig	Base Collector configuration.
CollectorEvent	A crawler event.
CollectorEvent.Builder
CollectorException	Runtime exception for most unrecoverable issues thrown by Collector classes.
CollectorLifeCycleListener	Collector event listener adapter for collector startup/shutdown.
CollectorStopperException	Exception thrown when a problem occurred while trying to stop a collector.
CommitModuleStage	Common pipeline stage for committing documents.
ConfigCheckCommand	Validate configuration file format and quit.
ConfigRenderCommand	Resolve all includes and variables substitution and print the resulting configuration to facilitate sharing.
CrawlDoc	A crawl document, which holds an additional `DocInfo` from cache (if any).
CrawlDocInfo
CrawlDocInfo.Stage
CrawlDocInfoService
CrawlDocMetadata	Metadata constants for common metadata field names typically set by a collector crawler.
Crawler	Abstract crawler implementation providing a common base to building crawlers.
Crawler.ReferenceProcessStatus
CrawlerCommitterService	Wrapper around multiple Committers so they can all be handled as one.
CrawlerConfig	Base Crawler configuration.
CrawlerConfig.OrphansStrategy
CrawlerConfigLoader	HTTP Crawler configuration loader.
CrawlerEvent	A crawler event.
CrawlerEvent.Builder
CrawlerLifeCycleListener	Listener adapter for crawler events.
CrawlerMonitor
CrawlerMonitorJMX
CrawlerMonitorMXBean
CrawlState	Reference processing status.
DataStoreException	Crawl data store runtime exception.
DataStoreExporter	Exports data stores to a format that can be imported back to the same or different store implementation.
DataStoreImporter	Imports from a previously exported data store.
DeleteRejectedEventListener	Provides the ability to send deletion requests to your configured committer(s) whenever a reference is rejected, regardless whether it was encountered in a previous crawling session or not.
DocInfoPipelineContext	A `IPipelineStage` context for collector `Pipeline`s dealing with a `CrawlDocInfo` (e.g. document queuing).
DocumentChecksumStage	Common pipeline stage for creating a document checksum.
DocumentFiltersStage
DocumentPipelineContext	`IPipelineStage` context for collector `Pipeline`s dealing with an `Doc`.
ExtensionReferenceFilter	Filters a reference based on a comma-separated list of extensions.
FileBasedStopper	Listens for STOP requests using a stop file.
GenericMetadataChecksummer	Generic implementation of `IMetadataChecksummer` that uses specified field names and their values to create a checksum.
GenericSpoiledReferenceStrategizer	Generic implementation of `ISpoiledReferenceStrategizer` that offers a simple mapping between the crawl state of references that have turned "bad" and the strategy to adopt for each.
ICollectorStopper	Responsible for shutting down a Collector upon explicit invocation of `ICollectorStopper.fireStopRequest(Collector)` or when specific conditions are met.
IDataStore<T>
IDataStoreEngine
IDocumentChecksummer	Creates a checksum representing a a document.
IDocumentFilter	Filter a document after the document content is fetched, downloaded, or otherwise read or acquired.
IMetadataChecksummer	Creates a checksum representing a document based on document metadata values obtained prior to fetching that document (e.g.
IMetadataFilter	Filter a reference based on the metadata that could be obtained for a document, before it was fetched, downloaded, or otherwise read or acquired (e.g.
ImporterPipelineContext	`IPipelineStage` context for collector `Pipeline`s dealing with `ImporterResponse`.
ImportModuleStage	Common pipeline stage for importing documents.
IReferenceFilter	Filter a document based on its reference, before its properties or content gets read or otherwise acquired.
ISpoiledReferenceStrategizer	Decides which strategy to adopt for a given reference with a bad state.
JdbcDataStore<T>
JdbcDataStoreEngine	Data store engine using a JDBC-compatible database for storing crawl data.
MD5DocumentChecksummer	Implementation of `IDocumentChecksummer` which returns a MD5 checksum value of the extracted document content unless one or more given source fields are specified, in which case the MD5 checksum value is constructed from those fields.
MdcUtil	Utility methods to simplify adding Mapped Diagnostic Context (MDC) to logging in a consistent way for crawlers and collectors, as well as offering filename-friendly version as well.
MetadataFilter	Accepts or rejects a reference based on whether one or more metadata field values are matching.
MongoDataStore<T>
MongoDataStoreEngine	Data store engine using MongoDB for storing crawl data.
MVStoreDataStore<T>
MVStoreDataStoreConfig	MVStore configuration parameters.
MVStoreDataStoreEngine
QueueReferenceStage	Common pipeline stage for queuing documents.
ReferenceFilter	Filters URL based on a regular expression.
ReferenceFiltersStage	Common pipeline stage for filtering references.
ReferenceFiltersStageUtil	Reference-filtering stage utility methods.
RegexMetadataFilter	Deprecated. Since 2.0.0, use `MetadataFilter` instead.
RegexReferenceFilter	Deprecated. Since 2.0.0, use `ReferenceFilter`
SaveDocumentStage	Common pipeline stage for saving documents.
SpoiledReferenceStrategy	Markers indicating what to do with references that were once processed properly, but failed to get a good processing state a subsequent time around.
StartCommand	Start the Collector.
StopCommand	Stop the Collector.
StopCrawlerOnMaxEventListener	Alternative to `CrawlerConfig.setMaxDocuments(int)` for stopping the crawler upon reaching specific event counts.
StopCrawlerOnMaxEventListener.OnMultiple
StoreExportCommand	Export crawl store to specified file.
StoreImportCommand	Import crawl store from specified file.