com.norconex.collector.core.crawler.CrawlerEvent

All Implemented Interfaces:: Serializable

public class CrawlerEvent extends Event

A crawler event.

Since:

2.0.0

Author:

Pascal Essiembre

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

CrawlerEvent.Builder
Field Summary

Fields

Modifier and Type

Field

Description

static final String

CRAWLER_CLEAN_BEGIN

static final String

CRAWLER_CLEAN_END

static final String

CRAWLER_INIT_BEGIN

The crawler began its initialization.

static final String

CRAWLER_INIT_END

The crawler has been initialized.

static final String

CRAWLER_RUN_BEGIN

The crawler is about to begin crawling.

static final String

CRAWLER_RUN_END

The crawler completed crawling execution normally (without being stopped).

static final String

CRAWLER_RUN_THREAD_BEGIN

The crawler just started a new crawling thread.

static final String

CRAWLER_RUN_THREAD_END

The crawler completed execution of a crawling thread.

static final String

CRAWLER_STOP_BEGIN

Issued when a request to stop the crawler has been received.

static final String

CRAWLER_STOP_END

Issued when a request to stop the crawler has been fully executed (crawler stopped).

static final String

DOCUMENT_COMMITTED_DELETE

A document was submitted to a committer for removal.

static final String

DOCUMENT_COMMITTED_UPSERT

A document was submitted to a committer for upsert.

static final String

DOCUMENT_FETCHED

A document was successfully retrieved for processing.

static final String

DOCUMENT_IMPORTED

A document was imported.

static final String

DOCUMENT_METADATA_FETCHED

A document metadata fields were successfully retrieved.

static final String

DOCUMENT_POSTIMPORTED

A document post-import processor was executed properly.

static final String

DOCUMENT_PREIMPORTED

A document pre-import processor was executed properly.

static final String

DOCUMENT_PROCESSED

A document was processed (successfully or not).

static final String

DOCUMENT_QUEUED

A document reference was queued in the data store for processing.

static final String

DOCUMENT_SAVED

A document was saved.

static final String

REJECTED_BAD_STATUS

A document was rejected because the status obtained when trying to obtain it was not accepted (e.g., 500 HTTP error code).

static final String

REJECTED_DUPLICATE

A document was rejected since another document with a different reference was already processed with the same digital signature ( checksum).

static final String

REJECTED_ERROR

A document was rejected because an error occurred when processing it.

static final String

REJECTED_FILTER

A document was rejected by a filters.

static final String

REJECTED_IMPORT

A document was rejected by the Importer module.

static final String

REJECTED_NOTFOUND

A document was rejected because it could not be found (e.g., no longer exists at a given location).

static final String

REJECTED_PREMATURE

A document could not be re-crawled because it is not yet ready to be re-crawled.

static final String

REJECTED_UNMODIFIED

A document was rejected as it was not modified since last time it was crawled.

Fields inherited from class java.util.EventObject
source
Method Summary

Modifier and Type

Method

Description

boolean

equals(Object other)

CrawlDocInfo

getCrawlDocInfo()

Gets the crawl data holding contextual information about the crawled reference.

Crawler

getSource()

Object

getSubject()

Gets the subject.

int

hashCode()

boolean

isCrawlerShutdown()

String

toString()

Methods inherited from class com.norconex.commons.lang.event.Event
getException, getMessage, getName, is, is

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Field Details
- CRAWLER_INIT_BEGIN
  
  public static final String CRAWLER_INIT_BEGIN
  
  The crawler began its initialization.
  See Also:
  
  Constant Field Values
- CRAWLER_INIT_END
  
  public static final String CRAWLER_INIT_END
  
  The crawler has been initialized.
  See Also:
  
  Constant Field Values
- CRAWLER_RUN_BEGIN
  
  public static final String CRAWLER_RUN_BEGIN
  
  The crawler is about to begin crawling.
  See Also:
  
  Constant Field Values
- CRAWLER_RUN_END
  
  public static final String CRAWLER_RUN_END
  
  The crawler completed crawling execution normally (without being stopped). This event is triggered before the crawler resources are released.
  See Also:
  
  Constant Field Values
- CRAWLER_RUN_THREAD_BEGIN
  
  public static final String CRAWLER_RUN_THREAD_BEGIN
  
  The crawler just started a new crawling thread.
  See Also:
  
  Constant Field Values
- CRAWLER_RUN_THREAD_END
  
  public static final String CRAWLER_RUN_THREAD_END
  
  The crawler completed execution of a crawling thread.
  See Also:
  
  Constant Field Values
- CRAWLER_STOP_BEGIN
  
  public static final String CRAWLER_STOP_BEGIN
  
  Issued when a request to stop the crawler has been received.
  See Also:
  
  Constant Field Values
- CRAWLER_STOP_END
  
  public static final String CRAWLER_STOP_END
  
  Issued when a request to stop the crawler has been fully executed (crawler stopped).
  See Also:
  
  Constant Field Values
- CRAWLER_CLEAN_BEGIN
  
  public static final String CRAWLER_CLEAN_BEGIN
  See Also:
  
  Constant Field Values
- CRAWLER_CLEAN_END
  
  public static final String CRAWLER_CLEAN_END
  See Also:
  
  Constant Field Values
- REJECTED_FILTER
  
  public static final String REJECTED_FILTER
  
  A document was rejected by a filters.
  See Also:
  
  Constant Field Values
- REJECTED_UNMODIFIED
  
  public static final String REJECTED_UNMODIFIED
  
  A document was rejected as it was not modified since last time it was crawled.
  See Also:
  
  Constant Field Values
- REJECTED_DUPLICATE
  
  public static final String REJECTED_DUPLICATE
  
  A document was rejected since another document with a different reference was already processed with the same digital signature ( checksum).
  Since:
  
  2.0.0
  
  See Also:
  
  Constant Field Values
- REJECTED_PREMATURE
  
  public static final String REJECTED_PREMATURE
  
  A document could not be re-crawled because it is not yet ready to be re-crawled.
  See Also:
  
  Constant Field Values
- REJECTED_NOTFOUND
  
  public static final String REJECTED_NOTFOUND
  
  A document was rejected because it could not be found (e.g., no longer exists at a given location).
  See Also:
  
  Constant Field Values
- REJECTED_BAD_STATUS
  
  public static final String REJECTED_BAD_STATUS
  
  A document was rejected because the status obtained when trying to obtain it was not accepted (e.g., 500 HTTP error code).
  See Also:
  
  Constant Field Values
- REJECTED_IMPORT
  
  public static final String REJECTED_IMPORT
  
  A document was rejected by the Importer module.
  See Also:
  
  Constant Field Values
- REJECTED_ERROR
  
  public static final String REJECTED_ERROR
  
  A document was rejected because an error occurred when processing it.
  See Also:
  
  Constant Field Values
- DOCUMENT_PREIMPORTED
  
  public static final String DOCUMENT_PREIMPORTED
  
  A document pre-import processor was executed properly.
  See Also:
  
  Constant Field Values
- DOCUMENT_IMPORTED
  
  public static final String DOCUMENT_IMPORTED
  
  A document was imported.
  See Also:
  
  Constant Field Values
- DOCUMENT_POSTIMPORTED
  
  public static final String DOCUMENT_POSTIMPORTED
  
  A document post-import processor was executed properly.
  See Also:
  
  Constant Field Values
- DOCUMENT_COMMITTED_UPSERT
  
  public static final String DOCUMENT_COMMITTED_UPSERT
  
  A document was submitted to a committer for upsert.
  See Also:
  
  Constant Field Values
- DOCUMENT_COMMITTED_DELETE
  
  public static final String DOCUMENT_COMMITTED_DELETE
  
  A document was submitted to a committer for removal.
  See Also:
  
  Constant Field Values
- DOCUMENT_METADATA_FETCHED
  
  public static final String DOCUMENT_METADATA_FETCHED
  
  A document metadata fields were successfully retrieved.
  See Also:
  
  Constant Field Values
- DOCUMENT_FETCHED
  
  public static final String DOCUMENT_FETCHED
  
  A document was successfully retrieved for processing.
  See Also:
  
  Constant Field Values
- DOCUMENT_QUEUED
  
  public static final String DOCUMENT_QUEUED
  
  A document reference was queued in the data store for processing.
  See Also:
  
  Constant Field Values
- DOCUMENT_PROCESSED
  
  public static final String DOCUMENT_PROCESSED
  
  A document was processed (successfully or not).
  See Also:
  
  Constant Field Values
- DOCUMENT_SAVED
  
  public static final String DOCUMENT_SAVED
  
  A document was saved.
  See Also:
  
  Constant Field Values
Method Details
- getCrawlDocInfo
  
  public CrawlDocInfo getCrawlDocInfo()
  
  Gets the crawl data holding contextual information about the crawled reference. CRAWLER_* events will return a null crawl data.
  
  Returns:
  
  crawl data
- getSubject
  
  public Object getSubject()
  
  Gets the subject. That is, other relevant source related to the event.
  
  Returns:
  
  the subject
- getSource
  
  public Crawler getSource()
  
  Overrides:
  
  getSource in class Event
- isCrawlerShutdown
  
  public boolean isCrawlerShutdown()
- equals
  
  public boolean equals(Object other)
  
  Overrides:
  
  equals in class Event
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Event
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Event

Class CrawlerEvent

Nested Class Summary

Field Summary

Fields inherited from class java.util.EventObject

Method Summary

Methods inherited from class com.norconex.commons.lang.event.Event

Methods inherited from class java.lang.Object

Field Details

CRAWLER_INIT_BEGIN

CRAWLER_INIT_END

CRAWLER_RUN_BEGIN

CRAWLER_RUN_END

CRAWLER_RUN_THREAD_BEGIN

CRAWLER_RUN_THREAD_END

CRAWLER_STOP_BEGIN

CRAWLER_STOP_END

CRAWLER_CLEAN_BEGIN

CRAWLER_CLEAN_END

REJECTED_FILTER

REJECTED_UNMODIFIED

REJECTED_DUPLICATE

REJECTED_PREMATURE

REJECTED_NOTFOUND

REJECTED_BAD_STATUS

REJECTED_IMPORT

REJECTED_ERROR

DOCUMENT_PREIMPORTED

DOCUMENT_IMPORTED

DOCUMENT_POSTIMPORTED

DOCUMENT_COMMITTED_UPSERT

DOCUMENT_COMMITTED_DELETE

DOCUMENT_METADATA_FETCHED

DOCUMENT_FETCHED

DOCUMENT_QUEUED

DOCUMENT_PROCESSED

DOCUMENT_SAVED

Method Details

getCrawlDocInfo

getSubject

getSource

isCrawlerShutdown

equals

hashCode

toString