All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class |
Description |
AbstractDocumentChecksummer |
Abstract implementation of IDocumentChecksummer giving the option
to keep the generated checksum in a metadata field.
|
AbstractMetadataChecksummer |
|
AbstractPipelineContext |
|
AbstractSubCommand |
Base class for subcommands.
|
ChecksumStageUtil |
Checksum stage utility methods.
|
ChecksumUtil |
Checksum utility methods.
|
CleanCommand |
Clean the Collector crawling history.
|
Collector |
Base implementation of a Collector.
|
CollectorCommand |
Encapsulates command line arguments when running the Collector from
a command prompt.
|
CollectorCommandLauncher |
Launches a collector implementation from a string array representing
command line arguments.
|
CollectorConfig |
Base Collector configuration.
|
CollectorEvent |
A crawler event.
|
CollectorEvent.Builder |
|
CollectorException |
Runtime exception for most unrecoverable issues thrown by Collector
classes.
|
CollectorLifeCycleListener |
Collector event listener adapter for collector startup/shutdown.
|
CollectorStopperException |
Exception thrown when a problem occurred while trying to stop
a collector.
|
CommitModuleStage |
Common pipeline stage for committing documents.
|
ConfigCheckCommand |
Validate configuration file format and quit.
|
ConfigRenderCommand |
Resolve all includes and variables substitution and print the
resulting configuration to facilitate sharing.
|
CrawlDoc |
A crawl document, which holds an additional DocInfo from cache
(if any).
|
CrawlDocInfo |
|
CrawlDocInfo.Stage |
|
CrawlDocInfoService |
|
CrawlDocMetadata |
Metadata constants for common metadata field
names typically set by a collector crawler.
|
Crawler |
Abstract crawler implementation providing a common base to building
crawlers.
|
Crawler.ReferenceProcessStatus |
|
CrawlerCommitterService |
Wrapper around multiple Committers so they can all be handled as one.
|
CrawlerConfig |
Base Crawler configuration.
|
CrawlerConfig.OrphansStrategy |
|
CrawlerConfigLoader |
HTTP Crawler configuration loader.
|
CrawlerEvent |
A crawler event.
|
CrawlerEvent.Builder |
|
CrawlerLifeCycleListener |
Listener adapter for crawler events.
|
CrawlerMonitor |
|
CrawlerMonitorJMX |
|
CrawlerMonitorMXBean |
|
CrawlState |
Reference processing status.
|
DataStoreException |
Crawl data store runtime exception.
|
DataStoreExporter |
Exports data stores to a format that can be imported back to the same
or different store implementation.
|
DataStoreImporter |
Imports from a previously exported data store.
|
DeleteRejectedEventListener |
Provides the ability to send deletion requests to your configured
committer(s) whenever a reference is rejected, regardless whether it was
encountered in a previous crawling session or not.
|
DocInfoPipelineContext |
|
DocumentChecksumStage |
Common pipeline stage for creating a document checksum.
|
DocumentFiltersStage |
|
DocumentPipelineContext |
|
ExtensionReferenceFilter |
Filters a reference based on a comma-separated list of extensions.
|
FileBasedStopper |
Listens for STOP requests using a stop file.
|
GenericMetadataChecksummer |
Generic implementation of IMetadataChecksummer that uses
specified field names and their values to create a checksum.
|
GenericSpoiledReferenceStrategizer |
Generic implementation of ISpoiledReferenceStrategizer that
offers a simple mapping between the crawl state of references that have
turned "bad" and the strategy to adopt for each.
|
ICollectorStopper |
|
IDataStore<T> |
|
IDataStoreEngine |
|
IDocumentChecksummer |
Creates a checksum representing a a document.
|
IDocumentFilter |
Filter a document after the document content is fetched, downloaded,
or otherwise read or acquired.
|
IMetadataChecksummer |
Creates a checksum representing a document based on document metadata
values obtained prior to fetching that document (e.g.
|
IMetadataFilter |
Filter a reference based on the metadata that could be obtained for a
document, before it was fetched, downloaded, or otherwise read or acquired
(e.g.
|
ImporterPipelineContext |
|
ImportModuleStage |
Common pipeline stage for importing documents.
|
IReferenceFilter |
Filter a document based on its reference, before its properties or content
gets read or otherwise acquired.
|
ISpoiledReferenceStrategizer |
Decides which strategy to adopt for a given reference with a bad state.
|
JdbcDataStore<T> |
|
JdbcDataStoreEngine |
Data store engine using a JDBC-compatible database for storing
crawl data.
|
MD5DocumentChecksummer |
Implementation of IDocumentChecksummer which
returns a MD5 checksum value of the extracted document content unless
one or more given source fields are specified, in which case the MD5
checksum value is constructed from those fields.
|
MdcUtil |
Utility methods to simplify adding Mapped Diagnostic Context (MDC) to
logging in a consistent way for crawlers and collectors, as well as
offering filename-friendly version as well.
|
MetadataFilter |
Accepts or rejects a reference based on whether one or more
metadata field values are matching.
|
MongoDataStore<T> |
|
MongoDataStoreEngine |
Data store engine using MongoDB for storing crawl data.
|
MVStoreDataStore<T> |
|
MVStoreDataStoreConfig |
MVStore configuration parameters.
|
MVStoreDataStoreEngine |
|
QueueReferenceStage |
Common pipeline stage for queuing documents.
|
ReferenceFilter |
Filters URL based on a regular expression.
|
ReferenceFiltersStage |
Common pipeline stage for filtering references.
|
ReferenceFiltersStageUtil |
Reference-filtering stage utility methods.
|
RegexMetadataFilter |
Deprecated.
|
RegexReferenceFilter |
Deprecated.
|
SaveDocumentStage |
Common pipeline stage for saving documents.
|
SpoiledReferenceStrategy |
Markers indicating what to do with references that were once processed
properly, but failed to get a good processing state a subsequent time around.
|
StartCommand |
Start the Collector.
|
StopCommand |
Stop the Collector.
|
StopCrawlerOnMaxEventListener |
|
StopCrawlerOnMaxEventListener.OnMultiple |
|
StoreExportCommand |
Export crawl store to specified file.
|
StoreImportCommand |
Import crawl store from specified file.
|