Class CrawlerEvent

java.lang.Object
java.util.EventObject
com.norconex.commons.lang.event.Event
com.norconex.collector.core.crawler.CrawlerEvent
All Implemented Interfaces:
Serializable

public class CrawlerEvent extends Event
A crawler event.
Since:
2.0.0
Author:
Pascal Essiembre
See Also:
  • Field Details

    • CRAWLER_INIT_BEGIN

      public static final String CRAWLER_INIT_BEGIN
      The crawler began its initialization.
      See Also:
    • CRAWLER_INIT_END

      public static final String CRAWLER_INIT_END
      The crawler has been initialized.
      See Also:
    • CRAWLER_RUN_BEGIN

      public static final String CRAWLER_RUN_BEGIN
      The crawler is about to begin crawling.
      See Also:
    • CRAWLER_RUN_END

      public static final String CRAWLER_RUN_END
      The crawler completed crawling execution normally (without being stopped). This event is triggered before the crawler resources are released.
      See Also:
    • CRAWLER_RUN_THREAD_BEGIN

      public static final String CRAWLER_RUN_THREAD_BEGIN
      The crawler just started a new crawling thread.
      See Also:
    • CRAWLER_RUN_THREAD_END

      public static final String CRAWLER_RUN_THREAD_END
      The crawler completed execution of a crawling thread.
      See Also:
    • CRAWLER_STOP_BEGIN

      public static final String CRAWLER_STOP_BEGIN
      Issued when a request to stop the crawler has been received.
      See Also:
    • CRAWLER_STOP_END

      public static final String CRAWLER_STOP_END
      Issued when a request to stop the crawler has been fully executed (crawler stopped).
      See Also:
    • CRAWLER_CLEAN_BEGIN

      public static final String CRAWLER_CLEAN_BEGIN
      See Also:
    • CRAWLER_CLEAN_END

      public static final String CRAWLER_CLEAN_END
      See Also:
    • REJECTED_FILTER

      public static final String REJECTED_FILTER
      A document was rejected by a filters.
      See Also:
    • REJECTED_UNMODIFIED

      public static final String REJECTED_UNMODIFIED
      A document was rejected as it was not modified since last time it was crawled.
      See Also:
    • REJECTED_DUPLICATE

      public static final String REJECTED_DUPLICATE
      A document was rejected since another document with a different reference was already processed with the same digital signature ( checksum).
      Since:
      2.0.0
      See Also:
    • REJECTED_PREMATURE

      public static final String REJECTED_PREMATURE
      A document could not be re-crawled because it is not yet ready to be re-crawled.
      See Also:
    • REJECTED_NOTFOUND

      public static final String REJECTED_NOTFOUND
      A document was rejected because it could not be found (e.g., no longer exists at a given location).
      See Also:
    • REJECTED_BAD_STATUS

      public static final String REJECTED_BAD_STATUS
      A document was rejected because the status obtained when trying to obtain it was not accepted (e.g., 500 HTTP error code).
      See Also:
    • REJECTED_IMPORT

      public static final String REJECTED_IMPORT
      A document was rejected by the Importer module.
      See Also:
    • REJECTED_ERROR

      public static final String REJECTED_ERROR
      A document was rejected because an error occurred when processing it.
      See Also:
    • DOCUMENT_PREIMPORTED

      public static final String DOCUMENT_PREIMPORTED
      A document pre-import processor was executed properly.
      See Also:
    • DOCUMENT_IMPORTED

      public static final String DOCUMENT_IMPORTED
      A document was imported.
      See Also:
    • DOCUMENT_POSTIMPORTED

      public static final String DOCUMENT_POSTIMPORTED
      A document post-import processor was executed properly.
      See Also:
    • DOCUMENT_COMMITTED_UPSERT

      public static final String DOCUMENT_COMMITTED_UPSERT
      A document was submitted to a committer for upsert.
      See Also:
    • DOCUMENT_COMMITTED_DELETE

      public static final String DOCUMENT_COMMITTED_DELETE
      A document was submitted to a committer for removal.
      See Also:
    • DOCUMENT_METADATA_FETCHED

      public static final String DOCUMENT_METADATA_FETCHED
      A document metadata fields were successfully retrieved.
      See Also:
    • DOCUMENT_FETCHED

      public static final String DOCUMENT_FETCHED
      A document was successfully retrieved for processing.
      See Also:
    • DOCUMENT_QUEUED

      public static final String DOCUMENT_QUEUED
      A document reference was queued in the data store for processing.
      See Also:
    • DOCUMENT_PROCESSED

      public static final String DOCUMENT_PROCESSED
      A document was processed (successfully or not).
      See Also:
    • DOCUMENT_SAVED

      public static final String DOCUMENT_SAVED
      A document was saved.
      See Also:
  • Method Details

    • getCrawlDocInfo

      public CrawlDocInfo getCrawlDocInfo()
      Gets the crawl data holding contextual information about the crawled reference. CRAWLER_* events will return a null crawl data.
      Returns:
      crawl data
    • getSubject

      public Object getSubject()
      Gets the subject. That is, other relevant source related to the event.
      Returns:
      the subject
    • getSource

      public Crawler getSource()
      Overrides:
      getSource in class Event
    • isCrawlerShutdown

      public boolean isCrawlerShutdown()
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Event
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Event
    • toString

      public String toString()
      Overrides:
      toString in class Event