Package com.norconex.collector.http
Class HttpCollector
- java.lang.Object
-
- com.norconex.collector.core.Collector
-
- com.norconex.collector.http.HttpCollector
-
public class HttpCollector extends Collector
Main application class. Instances of this class can hold several crawler, running at once. This is convenient when there are configuration setting to be shared amongst crawlers. When you have many crawler jobs defined that have nothing in common, it may be best to configure and run them separately, to facilitate troubleshooting. There is no set rules for this, experimenting with your target sites will help you.- Author:
- Pascal Essiembre
-
-
Field Summary
-
Fields inherited from class com.norconex.collector.core.Collector
NORCONEX_ASCII
-
-
Constructor Summary
Constructors Constructor Description HttpCollector()
Creates a non-configured HTTP collector.HttpCollector(HttpCollectorConfig collectorConfig)
Creates and configure an HTTP Collector with the provided configuration.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Crawler
createCrawler(CrawlerConfig config)
HttpCollectorConfig
getCollectorConfig()
static void
main(String[] args)
Invokes the HTTP Collector from the command line.-
Methods inherited from class com.norconex.collector.core.Collector
clean, destroyCollector, exportDataStore, fireStopRequest, get, getCrawlers, getEventManager, getId, getReleaseVersions, getStreamFactory, getTempDir, getVersion, getWorkDir, importDataStore, initCollector, isRunning, lock, start, stop, toString, unlock
-
-
-
-
Constructor Detail
-
HttpCollector
public HttpCollector()
Creates a non-configured HTTP collector.
-
HttpCollector
public HttpCollector(HttpCollectorConfig collectorConfig)
Creates and configure an HTTP Collector with the provided configuration.- Parameters:
collectorConfig
- HTTP Collector configuration
-
-
Method Detail
-
main
public static void main(String[] args)
Invokes the HTTP Collector from the command line.- Parameters:
args
- Invoke it once without any arguments to get a list of command-line options.
-
getCollectorConfig
public HttpCollectorConfig getCollectorConfig()
- Overrides:
getCollectorConfig
in classCollector
-
createCrawler
protected Crawler createCrawler(CrawlerConfig config)
- Specified by:
createCrawler
in classCollector
-
-