All Classes Interface Summary Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractDocumentChecksummer |
Abstract implementation of IDocumentChecksummer giving the option
to keep the generated checksum in a metadata field.
|
| AbstractMetadataChecksummer |
|
| AbstractPipelineContext |
|
| AbstractSubCommand |
Base class for subcommands.
|
| ChecksumStageUtil |
Checksum stage utility methods.
|
| ChecksumUtil |
Checksum utility methods.
|
| CleanCommand |
Clean the Collector crawling history.
|
| Collector |
Base implementation of a Collector.
|
| CollectorCommand |
Encapsulates command line arguments when running the Collector from
a command prompt.
|
| CollectorCommandLauncher |
Launches a collector implementation from a string array representing
command line arguments.
|
| CollectorConfig |
Base Collector configuration.
|
| CollectorEvent |
A crawler event.
|
| CollectorEvent.Builder |
|
| CollectorException |
Runtime exception for most unrecoverable issues thrown by Collector
classes.
|
| CollectorLifeCycleListener |
Collector event listener adapter for collector startup/shutdown.
|
| CollectorStopperException |
Exception thrown when a problem occurred while trying to stop
a collector.
|
| CommitModuleStage |
Common pipeline stage for committing documents.
|
| ConfigCheckCommand |
Validate configuration file format and quit.
|
| ConfigRenderCommand |
Resolve all includes and variables substitution and print the
resulting configuration to facilitate sharing.
|
| CrawlDoc |
A crawl document, which holds an additional DocInfo from cache
(if any).
|
| CrawlDocInfo |
|
| CrawlDocInfo.Stage |
|
| CrawlDocInfoService |
|
| CrawlDocMetadata |
Metadata constants for common metadata field
names typically set by a collector crawler.
|
| Crawler |
Abstract crawler implementation providing a common base to building
crawlers.
|
| Crawler.ReferenceProcessStatus |
|
| CrawlerCommitterService |
Wrapper around multiple Committers so they can all be handled as one.
|
| CrawlerConfig |
Base Crawler configuration.
|
| CrawlerConfig.OrphansStrategy |
|
| CrawlerConfigLoader |
HTTP Crawler configuration loader.
|
| CrawlerEvent |
A crawler event.
|
| CrawlerEvent.Builder |
|
| CrawlerLifeCycleListener |
Listener adapter for crawler events.
|
| CrawlerMonitor |
|
| CrawlerMonitorJMX |
|
| CrawlerMonitorMXBean |
|
| CrawlState |
Reference processing status.
|
| DataStoreException |
Crawl data store runtime exception.
|
| DataStoreExporter |
Exports data stores to a format that can be imported back to the same
or different store implementation.
|
| DataStoreImporter |
Imports from a previously exported data store.
|
| DeleteRejectedEventListener |
Provides the ability to send deletion requests to your configured
committer(s) whenever a reference is rejected, regardless whether it was
encountered in a previous crawling session or not.
|
| DocInfoPipelineContext |
|
| DocumentChecksumStage |
Common pipeline stage for creating a document checksum.
|
| DocumentFiltersStage |
|
| DocumentPipelineContext |
|
| ExtensionReferenceFilter |
Filters a reference based on a comma-separated list of extensions.
|
| FileBasedStopper |
Listens for STOP requests using a stop file.
|
| GenericMetadataChecksummer |
Generic implementation of IMetadataChecksummer that uses
specified field names and their values to create a checksum.
|
| GenericSpoiledReferenceStrategizer |
Generic implementation of ISpoiledReferenceStrategizer that
offers a simple mapping between the crawl state of references that have
turned "bad" and the strategy to adopt for each.
|
| ICollectorStopper |
|
| IDataStore<T> |
|
| IDataStoreEngine |
|
| IDocumentChecksummer |
Creates a checksum representing a a document.
|
| IDocumentFilter |
Filter a document after the document content is fetched, downloaded,
or otherwise read or acquired.
|
| IMetadataChecksummer |
Creates a checksum representing a document based on document metadata
values obtained prior to fetching that document (e.g.
|
| IMetadataFilter |
Filter a reference based on the metadata that could be obtained for a
document, before it was fetched, downloaded, or otherwise read or acquired
(e.g.
|
| ImporterPipelineContext |
|
| ImportModuleStage |
Common pipeline stage for importing documents.
|
| IReferenceFilter |
Filter a document based on its reference, before its properties or content
gets read or otherwise acquired.
|
| ISpoiledReferenceStrategizer |
Decides which strategy to adopt for a given reference with a bad state.
|
| JdbcDataStore<T> |
|
| JdbcDataStoreEngine |
Data store engine using a JDBC-compatible database for storing
crawl data.
|
| MD5DocumentChecksummer |
Implementation of IDocumentChecksummer which
returns a MD5 checksum value of the extracted document content unless
one or more given source fields are specified, in which case the MD5
checksum value is constructed from those fields.
|
| MdcUtil |
Utility methods to simplify adding Mapped Diagnostic Context (MDC) to
logging in a consistent way for crawlers and collectors, as well as
offering filename-friendly version as well.
|
| MetadataFilter |
Accepts or rejects a reference based on whether one or more
metadata field values are matching.
|
| MongoDataStore<T> |
|
| MongoDataStoreEngine |
Data store engine using MongoDB for storing crawl data.
|
| MVStoreDataStore<T> |
|
| MVStoreDataStoreConfig |
MVStore configuration parameters.
|
| MVStoreDataStoreEngine |
|
| QueueReferenceStage |
Common pipeline stage for queuing documents.
|
| ReferenceFilter |
Filters URL based on a regular expression.
|
| ReferenceFiltersStage |
Common pipeline stage for filtering references.
|
| ReferenceFiltersStageUtil |
Reference-filtering stage utility methods.
|
| RegexMetadataFilter |
Deprecated.
|
| RegexReferenceFilter |
Deprecated.
|
| SaveDocumentStage |
Common pipeline stage for saving documents.
|
| SpoiledReferenceStrategy |
Markers indicating what to do with references that were once processed
properly, but failed to get a good processing state a subsequent time around.
|
| StartCommand |
Start the Collector.
|
| StopCommand |
Stop the Collector.
|
| StopCrawlerOnMaxEventListener |
|
| StopCrawlerOnMaxEventListener.OnMultiple |
|
| StoreExportCommand |
Export crawl store to specified file.
|
| StoreImportCommand |
Import crawl store from specified file.
|