Skip navigation links
A B C D E F G H I J L M N O P R S T U V W X 

A

AbstractDelay - Class in com.norconex.collector.http.delay.impl
Convenience class to encapsulate various delay strategies.
AbstractDelay() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelay
 
AbstractDelayResolver - Class in com.norconex.collector.http.delay.impl
Base implementation for creating voluntary delays between URL downloads.
AbstractDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
acceptDocument(ImporterDocument) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
acceptMetadata(String, Properties) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
acceptReference(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
accepts(String, ContentType) - Method in interface com.norconex.collector.http.url.ILinkExtractor
Whether this link extraction should be executed for the given URL and/or content type.
accepts(String, ContentType) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
accepts(String, ContentType) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
accepts(String, ContentType) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
accepts(String, ContentType) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
add(String, Long, SitemapChangeFrequency, Float) - Method in class com.norconex.collector.http.sitemap.SitemapURLAdder
 
add(HttpCrawlData) - Method in class com.norconex.collector.http.sitemap.SitemapURLAdder
 
addExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Adds patterns delimiting a portion of a document to be considered for link extraction.
addExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Adds selectors matching the portions of a document to be considered for link extraction.
addLinkTag(String, String) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
addNoExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Adds patterns delimiting a portion of a document to be excluded from link extraction.
addNoExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Adds selectors matching the portions of a document to be excluded from link extraction.
addNofollowPatterns(String) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Adds a pattern for references for which link extraction is disabled.
addPattern(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
addPattern(String, String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Adds a URL pattern, with an optional replacement.
addPattern(String, int) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Deprecated.
ALL_FIELDS - Static variable in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
AUTH_METHOD_BASIC - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
BASIC authentication method.
AUTH_METHOD_DIGEST - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
DIGEST authentication method.
AUTH_METHOD_FORM - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Form-based authentication method.
AUTH_METHOD_KERBEROS - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Experimental: Kerberos authentication method.
AUTH_METHOD_NTLM - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
NTLM authentication method.
AUTH_METHOD_SPNEGO - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Experimental: SPNEGO authentication method.
authenticateUsingForm(HttpClient) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 

B

beforeFinalizeDocumentProcessing(BaseCrawlData, ICrawlDataStore, ImporterDocument, ICrawlData) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
buildCustomHttpClient(HttpClientBuilder) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
For implementors to subclass.

C

cleanupExecution(JobStatusUpdater, JobSuite, ICrawlDataStore) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
clearLinkTags() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
clearPatterns() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
close() - Method in class com.norconex.collector.http.sitemap.impl.SitemapStore
 
COLLECTOR_DEPTH - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_FEATURED_IMAGE_INLINE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_FEATURED_IMAGE_PATH - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_FEATURED_IMAGE_URL - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
COLLECTOR_PHANTOMJS_SCREENSHOT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
COLLECTOR_REDIRECT_TRAIL - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERENCED_URLS - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERENCED_URLS_OUT_OF_SCOPE - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERRER_LINK_TAG - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERRER_LINK_TEXT - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERRER_LINK_TITLE - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_REFERRER_REFERENCE - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_SM_CHANGE_FREQ - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_SM_LASTMOD - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_SM_PRORITY - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
COLLECTOR_URL - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
com.norconex.collector.http - package com.norconex.collector.http
 
com.norconex.collector.http.checksum.impl - package com.norconex.collector.http.checksum.impl
 
com.norconex.collector.http.client - package com.norconex.collector.http.client
 
com.norconex.collector.http.client.impl - package com.norconex.collector.http.client.impl
 
com.norconex.collector.http.crawler - package com.norconex.collector.http.crawler
 
com.norconex.collector.http.crawler.event.impl - package com.norconex.collector.http.crawler.event.impl
 
com.norconex.collector.http.data - package com.norconex.collector.http.data
 
com.norconex.collector.http.data.store.impl.jdbc - package com.norconex.collector.http.data.store.impl.jdbc
 
com.norconex.collector.http.data.store.impl.mongo - package com.norconex.collector.http.data.store.impl.mongo
 
com.norconex.collector.http.delay - package com.norconex.collector.http.delay
 
com.norconex.collector.http.delay.impl - package com.norconex.collector.http.delay.impl
 
com.norconex.collector.http.doc - package com.norconex.collector.http.doc
 
com.norconex.collector.http.fetch - package com.norconex.collector.http.fetch
 
com.norconex.collector.http.fetch.impl - package com.norconex.collector.http.fetch.impl
 
com.norconex.collector.http.filter.impl - package com.norconex.collector.http.filter.impl
 
com.norconex.collector.http.pipeline.committer - package com.norconex.collector.http.pipeline.committer
 
com.norconex.collector.http.pipeline.importer - package com.norconex.collector.http.pipeline.importer
 
com.norconex.collector.http.pipeline.queue - package com.norconex.collector.http.pipeline.queue
 
com.norconex.collector.http.processor - package com.norconex.collector.http.processor
 
com.norconex.collector.http.processor.impl - package com.norconex.collector.http.processor.impl
 
com.norconex.collector.http.recrawl - package com.norconex.collector.http.recrawl
 
com.norconex.collector.http.recrawl.impl - package com.norconex.collector.http.recrawl.impl
 
com.norconex.collector.http.redirect - package com.norconex.collector.http.redirect
 
com.norconex.collector.http.redirect.impl - package com.norconex.collector.http.redirect.impl
 
com.norconex.collector.http.robot - package com.norconex.collector.http.robot
 
com.norconex.collector.http.robot.impl - package com.norconex.collector.http.robot.impl
 
com.norconex.collector.http.sitemap - package com.norconex.collector.http.sitemap
 
com.norconex.collector.http.sitemap.impl - package com.norconex.collector.http.sitemap.impl
 
com.norconex.collector.http.url - package com.norconex.collector.http.url
 
com.norconex.collector.http.url.impl - package com.norconex.collector.http.url.impl
 
compareTo(Link) - Method in class com.norconex.collector.http.url.Link
 
contains(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
contains(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
contains(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
CrawlerDelay - Class in com.norconex.collector.http.delay.impl
It is assumed there will be one instance of this class per crawler defined.
CrawlerDelay() - Constructor for class com.norconex.collector.http.delay.impl.CrawlerDelay
 
crawlerEvent(ICrawler, CrawlerEvent) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
createConnectionConfig() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createCrawler(ICrawlerConfig) - Method in class com.norconex.collector.http.HttpCollector
 
createCredentialsProvider() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
CREATED_ROBOTS_META - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
createDefaultCookieStore() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Creates the default cookie store to be added to each request context.
createDefaultRequestHeaders() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Creates a list of HTTP headers previously set by GenericHttpClientFactory.setRequestHeader(String, String).
createEmbeddedCrawlData(String, ICrawlData) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
createHTTPClient(String) - Method in interface com.norconex.collector.http.client.IHttpClientFactory
Initializes the HTTP Client used for crawling.
createHTTPClient(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createJDBCSerializer() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataStoreFactory
 
createMongoSerializer() - Method in class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataStoreFactory
 
createProxy() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createRedirectStrategy() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createRequestConfig() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createSchemePortResolver() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createSitemapResolver(HttpCrawlerConfig, boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
createSitemapResolver(HttpCrawlerConfig, boolean) - Method in interface com.norconex.collector.http.sitemap.ISitemapResolverFactory
 
createSSLContext() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createSSLSocketFactory(SSLContext) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
createUriRequest(HttpDocument) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Creates the HTTP request to be executed.
createUriRequest(String) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
Creates the HTTP request to be executed.

D

DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
DEFAULT_DELAY - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Default delay is 3 seconds.
DEFAULT_FALLBACK_CHARSET - Static variable in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
DEFAULT_FILENAME_PREFIX - Static variable in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
DEFAULT_IMAGE_CACHE_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_IMAGE_CACHE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_MAX_CONNECTIONS - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
DEFAULT_MAX_CONNECTIONS_PER_ROUTE - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
DEFAULT_MAX_IDLE_TIME - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
DEFAULT_MAX_REDIRECT - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Default maximum length a URL can have.
DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Default maximum length a URL can have.
DEFAULT_MIN_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_PAGE_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_PRIORITY - Static variable in class com.norconex.collector.http.sitemap.SitemapURLAdder
 
DEFAULT_RENDER_WAIT_TIME - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCALE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_SCREENSHOT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCREENSHOT_SCALE_SIZE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCREENSHOT_STORAGE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCREENSHOT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCREENSHOT_ZOOM_FACTOR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SCRIPT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
DEFAULT_SEGMENT_COUNT - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Default segment count.
DEFAULT_SEGMENT_SEPARATOR_PATTERN - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Default segment separator pattern.
DEFAULT_SITEMAP_PATHS - Static variable in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
DEFAULT_STORAGE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_TIMEOUT - Static variable in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
delay(RobotsTxt, String) - Method in interface com.norconex.collector.http.delay.IDelayResolver
Delay crawling activities (if applicable).
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
 
delay(long, long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
 
delay(RobotsTxt, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.CrawlerDelay
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.SiteDelay
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.ThreadDelay
 
DelayReferencePattern(String, long) - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
DelaySchedule(String, String, String, long) - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
detectFromContent(String, InputStream, ContentType) - Method in interface com.norconex.collector.http.url.ICanonicalLinkDetector
Detects from a document content the presence of a canonical URL.
detectFromContent(String, InputStream, ContentType) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
detectFromMetadata(String, HttpMetadata) - Method in interface com.norconex.collector.http.url.ICanonicalLinkDetector
Detects from metadata gathered so far, which when invoked, is normally the HTTP header values.
detectFromMetadata(String, HttpMetadata) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
doCreateMetaChecksum(Properties) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 

E

equals(Object) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
equals(Object) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
equals(Object) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
equals(Object) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
equals(Object) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
equals(Object) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
equals(Object) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
equals(Object) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
equals(Object) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
equals(Object) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
equals(Object) - Method in class com.norconex.collector.http.robot.RobotsMeta
 
equals(Object) - Method in class com.norconex.collector.http.robot.RobotsTxt
 
equals(Object) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
equals(Object) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.url.Link
 
executeCommitterPipeline(ICrawler, ImporterDocument, ICrawlDataStore, BaseCrawlData, BaseCrawlData) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
executeImporterPipeline(ImporterPipelineContext) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
executeQueuePipeline(ICrawlData, ICrawlDataStore) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
extractLinks(InputStream, String, ContentType) - Method in interface com.norconex.collector.http.url.ILinkExtractor
Extracts links from a document.
extractLinks(InputStream, String, ContentType) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
extractLinks(InputStream, String, ContentType) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
extractLinks(InputStream, String, ContentType) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
extractLinks(InputStream, String, ContentType) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 

F

FeaturedImageProcessor - Class in com.norconex.collector.http.processor.impl
Document processor that extract the "main" image from HTML pages.
FeaturedImageProcessor() - Constructor for class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
FeaturedImageProcessor.Quality - Enum in com.norconex.collector.http.processor.impl
 
FeaturedImageProcessor.Storage - Enum in com.norconex.collector.http.processor.impl
 
FeaturedImageProcessor.StorageDiskStructure - Enum in com.norconex.collector.http.processor.impl
 
fetchDocument(HttpClient, HttpDocument) - Method in interface com.norconex.collector.http.fetch.IHttpDocumentFetcher
Fetches HTTP document and saves it to a local file
fetchDocument(HttpClient, HttpDocument) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
fetchDocument(HttpClient, HttpDocument) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
fetchHTTPHeaders(HttpClient, String, Properties) - Method in interface com.norconex.collector.http.fetch.IHttpMetadataFetcher
Fetches the HTTP headers for a URL and stores it in the provided Properties.
fetchHTTPHeaders(HttpClient, String, Properties) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
fits(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
fits(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
fits(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
fromDocument(Document) - Method in class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer
 

G

GenericCanonicalLinkDetector - Class in com.norconex.collector.http.url.impl
Generic canonical link detector.
GenericCanonicalLinkDetector() - Constructor for class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
GenericDelayResolver - Class in com.norconex.collector.http.delay.impl
Default implementation for creating voluntary delays between URL downloads.
GenericDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
GenericDelayResolver.DelaySchedule - Class in com.norconex.collector.http.delay.impl
 
GenericDelayResolver.DelaySchedule.DOW - Enum in com.norconex.collector.http.delay.impl
 
GenericDocumentFetcher - Class in com.norconex.collector.http.fetch.impl
Default implementation of IHttpDocumentFetcher.
GenericDocumentFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
GenericDocumentFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
GenericHttpClientFactory - Class in com.norconex.collector.http.client.impl
Default implementation of IHttpClientFactory.
GenericHttpClientFactory() - Constructor for class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
GenericLinkExtractor - Class in com.norconex.collector.http.url.impl
Generic link extractor for URLs found in HTML and possibly other text files.
GenericLinkExtractor() - Constructor for class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
GenericLinkExtractor.RegexPair - Class in com.norconex.collector.http.url.impl
 
GenericMetadataFetcher - Class in com.norconex.collector.http.fetch.impl
Basic implementation of IHttpMetadataFetcher.
GenericMetadataFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
GenericMetadataFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
GenericRecrawlableResolver - Class in com.norconex.collector.http.recrawl.impl
Relies on both sitemap directives and custom instructions for establishing the minimum frequency between each document recrawl.
GenericRecrawlableResolver() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
GenericRecrawlableResolver.MinFrequency - Class in com.norconex.collector.http.recrawl.impl
 
GenericRecrawlableResolver.SitemapSupport - Enum in com.norconex.collector.http.recrawl.impl
 
GenericRedirectURLProvider - Class in com.norconex.collector.http.redirect.impl
Provide redirect URLs by grabbing them from the HTTP Response Location header value.
GenericRedirectURLProvider() - Constructor for class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
GenericURLNormalizer - Class in com.norconex.collector.http.url.impl
Generic implementation of IURLNormalizer that should satisfy most URL normalization needs.
GenericURLNormalizer() - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
GenericURLNormalizer.Normalization - Enum in com.norconex.collector.http.url.impl
 
GenericURLNormalizer.Replace - Class in com.norconex.collector.http.url.impl
 
getAllowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets "Allow" filters.
getApplyTo() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
getApplyToContentTypePattern() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
getApplyToContentTypePattern() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
getApplyToReferencePattern() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
getApplyToReferencePattern() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
getArea() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getAuthDomain() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the NTLM authentication domain.
getAuthFormCharset() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the authentication form character set.
getAuthFormParam(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
getAuthFormParamNames() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets all authentication form parameter names.
getAuthHostname() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the host name for the current authentication scope.
getAuthMethod() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the authentication method.
getAuthPassword() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the authentication password.
getAuthPasswordField() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the name of the HTML field where the password is set.
getAuthPasswordKey() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the authentication password encryption key.
getAuthPort() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the port for the current authentication scope.
getAuthRealm() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the realm name for the current authentication scope.
getAuthURL() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the URL for "form" authentication.
getAuthUsername() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the username.
getAuthUsernameField() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the name of the HTML field where the username is set.
getAuthWorkstation() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the NTLM authentication workstation name.
getCachedCrawlData() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getCachedCrawlDataSQL() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getCachedCrawlDataValues(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getCacheDirectory() - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
getCanonicalLinkDetector() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the canonical link detector.
getChangeFrequency(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Gets the sitemap change frequency matching the supplied string.
getCharset() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the character set of pages on which link extraction is performed.
getCharset() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Gets the character set of pages on which link extraction is performed.
getCollectorConfig() - Method in class com.norconex.collector.http.HttpCollector
 
getConfig() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getConfig() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getConfig() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getConnectionCharset() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the connection character set.
getConnectionRequestTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the timeout when requesting a connection, in milliseconds
getConnectionTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the connection timeout until a connection is established, in milliseconds.
getContentType() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getContentTypePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getContentTypes() - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
getContentTypes() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
getContentTypes() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
getCookieSpec() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
getCount() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
getCrawlData() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getCrawlData() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getCrawlData() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getCrawlDate() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getCrawlDelay() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getCrawlerConfig() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getCrawlState() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
 
getCreateTableSQLs(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getDayOfMonthRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDayOfWeekRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDefaultDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets the default delay in milliseconds.
getDelay() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDelay() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
getDelayReferencePatterns() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
getDelayResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getDeleteCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getDeleteCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getDepth() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the URL depth.
getDisallowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets "Disallow" filters.
getDocument() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getDocument() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getDocumentFetcher() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getDocumentOutOfScopeUrls() - Method in class com.norconex.collector.http.doc.HttpMetadata
 
getDocumentUrl() - Method in class com.norconex.collector.http.doc.HttpMetadata
 
getDocumentUrls() - Method in class com.norconex.collector.http.doc.HttpMetadata
 
getDomSelector() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getEnd() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
getExePath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getExtractBetweens() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the patterns delimiting the portions of a document to be considered for link extraction.
getExtractSelectors() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the selectors matching the portions of a document to be considered for link extraction.
getFallbackCharset() - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
getFileNamePrefix() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the generated report file name prefix.
getFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets all filters.
getFromDate() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Gets the minimum EPOCH date (in milliseconds) a sitemap entry should have to be considered.
getFromDate() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Gets the minimum EPOCH date (in milliseconds) a sitemap entry should have to be considered.
getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getHeadersPrefix() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
getHttpClient() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getHttpClient() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getHttpClient() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getHttpClient() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getHttpClientFactory() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getHttpHeadersFetcher() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getImage(String) - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
getImage() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getImageCacheDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImageCacheSize() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImageFormat() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImporter() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getInsertCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getInsertCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getLinkExtractors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getLocalAddress() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the local address (IP or hostname).
getMatch() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
getMaxConnectionIdleTime() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.
getMaxConnectionInactiveTime() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
getMaxConnections() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the maximum number of connections that can be created.
getMaxConnectionsPerRoute() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the maximum number of connections that can be used per route.
getMaxDepth() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getMaxRedirects() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the maximum number of redirects to be followed.
getMaxURLLength() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the maximum supported URL length.
getMaxURLLength() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Gets the maximum supported URL length.
getMetadata() - Method in class com.norconex.collector.http.doc.HttpDocument
 
getMetadata() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getMetadataChecksummer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the metadata checksummer.
getMetadataFetcher() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getMinDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getMinFrequencies() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
getNextQueued(MongoCollection<Document>) - Method in class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer
 
getNextQueuedCrawlDataSQL() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getNextQueuedCrawlDataValues() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getNoExtractBetweens() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the patterns delimiting the portions of a document to be excluded from link extraction.
getNoExtractSelectors() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the selectors matching the portions of a document to be excluded from link extraction.
getNofollowPatterns() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the patterns of references for which link extraction is disabled.
getNormalizations() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Gets HTTP status codes to be considered as "Not found" state.
getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
Gets HTTP status codes to be considered as "Not found" state.
getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getOptions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getOriginalRedirectStrategy() - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
 
getOriginalReference() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
getOriginalSize() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getOutputDir() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the local directory where this listener report will be written.
getPageContentTypePattern() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getPath() - Method in interface com.norconex.collector.http.robot.IRobotsTxtFilter
 
getPattern() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
getPattern() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
getPatternMatchGroup(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Deprecated.
Since 2.8.0, use #getPatternReplacement(String) instead. It will return the group id if the "replacement" value only contains a group replacement (e.g. $1), else, it will always return -1.
getPatternReplacement(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Gets a pattern replacement.
getPatterns() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
getPort() - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
getPostImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getPreImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getProxyHost() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy host.
getProxyPassword() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy password.
getProxyPasswordKey() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy password encryption key.
getProxyPort() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy port.
getProxyRealm() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy realm.
getProxyScheme() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy scheme.
getProxyUsername() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the proxy username.
getReasonPhrase() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
 
getRecrawlableResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the recrawlable resolver.
getRedirect(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
 
getRedirectTrail() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the trail of URLs that were redirected up to this one.
getRedirectURL() - Static method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
 
getRedirectURLProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the redirect URL provider.
getRedirectURLProvider() - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
Gets the redirect URL provider.
getReference() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getReferencedUrls() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets URLs referenced by this one.
getReferenceExistsSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getReferenceExistsValues(String, String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getReferencePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getReferrer() - Method in class com.norconex.collector.http.url.Link
 
getReferrerLinkTag() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
getReferrerLinkText() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
getReferrerLinkTitle() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
getReferrerReference() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
getRenderWaitTime() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getReplacement() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
getReplaces() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
getRequestHeader(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the HTTP request header value matching the given name, previously set with GenericHttpClientFactory.setRequestHeader(String, String).
getRequestHeaderNames() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets all HTTP request header names for headers previously set with GenericHttpClientFactory.setRequestHeader(String, String).
getRequestHeaders() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
getResourceTimeout() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
getRobotsMeta() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getRobotsMeta(Reader, String, ContentType, Properties) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
getRobotsMeta(Reader, String, ContentType, Properties) - Method in interface com.norconex.collector.http.robot.IRobotsMetaProvider
Extracts Robots meta information for a page, if any.
getRobotsMetaProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getRobotsTxt(HttpClient, String, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
getRobotsTxt(HttpClient, String, String) - Method in interface com.norconex.collector.http.robot.IRobotsTxtProvider
Gets robots.txt rules.
getRobotsTxtProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getScaleDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getScaleQuality() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getSchedules() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
getSchemes() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets the schemes to be extracted.
getScope() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets the delay scope.
getScreenshotDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getScreenshotDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
getScreenshotImageFormat() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the screenshot image format (jpg, png, gif, bmp, etc.).
getScreenshotScaleDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the pixel dimensions we want the stored screenshot to have.
getScreenshotScaleQuality() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the screenshot scaling quality to use when when storage is "disk" or "inline".
getScreenshotStorage() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the screenshot storage mechanisms.
getScreenshotStorageDiskDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the directory where screenshots are saved when storage is "disk".
getScreenshotStorageDiskField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
getScreenshotStorageDiskStructure() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the screenshot directory structure to create when storage is "disk".
getScreenshotStorageInlineField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
getScreenshotZoomFactor() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getScriptPath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getSelectCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
getSeparator() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Gets the segment separator pattern
getSitemapChangeFreq() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the sitemap change frequency.
getSitemapChangeFreq() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getSitemapLastMod() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the sitemap last modified date in milliseconds (EPOCH date).
getSitemapLastMod() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getSitemapLocations() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
getSitemapLocations() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Deprecated.
getSitemapLocations() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Deprecated.
getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Gets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Gets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
getSitemapPriority() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the sitemap priority.
getSitemapPriority() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getSitemapResolverFactory() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getSitemapSupport() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Gets the sitemap support strategy.
getSitemapSupport(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
 
getSocketTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.
getSSLProtocols() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets the supported SSL/TLS protocols.
getStart() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
getStartSitemapURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets sitemap URLs to be used as starting points for crawling.
getStartURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getStartURLsFiles() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the file paths of seed files containing URLs to be used as "start URLs".
getStartURLsProviders() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the providers of URLs used as starting points for crawling.
getStatusCode() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
 
getStatusCodes() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the status codes to listen for.
getStorage() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageDiskDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageDiskField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageDiskStructure() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageInlineField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageUrlField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getTag() - Method in class com.norconex.collector.http.url.Link
 
getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Gets the directory where temporary sitemap files are written.
getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Gets the directory where sitemap files are temporary stored before they are parsed.
getText() - Method in class com.norconex.collector.http.url.Link
 
getTimeRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getTitle() - Method in class com.norconex.collector.http.url.Link
 
getUrl() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getUrl() - Method in class com.norconex.collector.http.url.Link
 
getURLCrawlScopeStrategy() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the strategy to use to determine if a URL is in scope.
getUrlNormalizer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getUrlRoot() - Method in class com.norconex.collector.http.data.HttpCrawlData
Gets the URL root (protocol + domain, e.g.
getUserAgent() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getValidExitCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
getValue() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
GOOD_REDIRECTS - Static variable in class com.norconex.collector.http.pipeline.importer.HttpImporterPipeline
 

H

hashCode() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
hashCode() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
hashCode() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
hashCode() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
hashCode() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
hashCode() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
hashCode() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
hashCode() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
hashCode() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
hashCode() - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
hashCode() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
hashCode() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
hashCode() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
hashCode() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
hashCode() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.url.Link
 
HTTP_CONTENT_LENGTH - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
HTTP_CONTENT_TYPE - Static variable in class com.norconex.collector.http.doc.HttpMetadata
 
HttpClientProxyCollectorListener - Class in com.norconex.collector.http.fetch.impl
Starts and stops an HTTP proxy that uses Apache HttpClient to make HTTP requests.
HttpClientProxyCollectorListener() - Constructor for class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
HttpCollector - Class in com.norconex.collector.http
Main application class.
HttpCollector() - Constructor for class com.norconex.collector.http.HttpCollector
Creates a non-configured HTTP collector.
HttpCollector(HttpCollectorConfig) - Constructor for class com.norconex.collector.http.HttpCollector
Creates and configure an HTTP Collector with the provided configuration.
HttpCollectorConfig - Class in com.norconex.collector.http
HTTP Collector configuration.
HttpCollectorConfig() - Constructor for class com.norconex.collector.http.HttpCollectorConfig
 
HttpCommitterPipeline - Class in com.norconex.collector.http.pipeline.committer
 
HttpCommitterPipeline() - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipeline
 
HttpCommitterPipelineContext - Class in com.norconex.collector.http.pipeline.committer
 
HttpCommitterPipelineContext(HttpCrawler, ICrawlDataStore, HttpDocument, HttpCrawlData, HttpCrawlData) - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
HttpCrawlData - Class in com.norconex.collector.http.data
A URL being crawled holding relevant crawl information.
HttpCrawlData() - Constructor for class com.norconex.collector.http.data.HttpCrawlData
Constructor.
HttpCrawlData(ICrawlData) - Constructor for class com.norconex.collector.http.data.HttpCrawlData
Constructor
HttpCrawlData(String, int) - Constructor for class com.norconex.collector.http.data.HttpCrawlData
Constructor.
HttpCrawler - Class in com.norconex.collector.http.crawler
The HTTP Crawler.
HttpCrawler(HttpCrawlerConfig) - Constructor for class com.norconex.collector.http.crawler.HttpCrawler
Constructor.
HttpCrawlerConfig - Class in com.norconex.collector.http.crawler
HTTP Crawler configuration.
HttpCrawlerConfig() - Constructor for class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
HttpCrawlerEvent - Class in com.norconex.collector.http.crawler
An HTTP Crawler Event.
HttpCrawlerEvent(String, ICrawlData, Object) - Constructor for class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
HttpCrawlState - Class in com.norconex.collector.http.data
Represents a URL crawling status.
HttpCrawlState(String) - Constructor for class com.norconex.collector.http.data.HttpCrawlState
 
HttpDocument - Class in com.norconex.collector.http.doc
 
HttpDocument(String, CachedInputStream) - Constructor for class com.norconex.collector.http.doc.HttpDocument
 
HttpDocument(ImporterDocument) - Constructor for class com.norconex.collector.http.doc.HttpDocument
 
HttpFetchResponse - Class in com.norconex.collector.http.fetch
Hold HTTP response information obtained from fetching a document.
HttpFetchResponse(CrawlState, int, String) - Constructor for class com.norconex.collector.http.fetch.HttpFetchResponse
 
HttpImporterPipeline - Class in com.norconex.collector.http.pipeline.importer
 
HttpImporterPipeline(boolean, boolean) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipeline
 
HttpImporterPipelineContext - Class in com.norconex.collector.http.pipeline.importer
 
HttpImporterPipelineContext(ImporterPipelineContext) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
Constructor creating a copy of supplied context.
HttpImporterPipelineContext(HttpCrawler, ICrawlDataStore, HttpCrawlData, HttpCrawlData, HttpDocument) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
HttpMetadata - Class in com.norconex.collector.http.doc
 
HttpMetadata(String) - Constructor for class com.norconex.collector.http.doc.HttpMetadata
 
HttpMetadata(Properties) - Constructor for class com.norconex.collector.http.doc.HttpMetadata
 
HttpQueuePipeline - Class in com.norconex.collector.http.pipeline.queue
Performs a URL handling logic before actual processing of the document it represents takes place.
HttpQueuePipeline() - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipeline
 
HttpQueuePipelineContext - Class in com.norconex.collector.http.pipeline.queue
 
HttpQueuePipelineContext(HttpCrawler, ICrawlDataStore, HttpCrawlData) - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 

I

ICanonicalLinkDetector - Interface in com.norconex.collector.http.url
Detects and return any canonical URL found in documents, whether from the HTTP headers (metadata), or from a page content (usually HTML).
IDelayResolver - Interface in com.norconex.collector.http.delay
Resolves and creates intentional "delays" to increase document download time intervals.
IHttpClientFactory - Interface in com.norconex.collector.http.client
Create (and initializes) an Apache HttpClient to be used for all HTTP requests this crawler will make.
IHttpDocumentFetcher - Interface in com.norconex.collector.http.fetch
Fetches the HTTP document and its metadata (HTTP Headers).
IHttpDocumentProcessor - Interface in com.norconex.collector.http.doc
Deprecated.
Since 2.8.0, use IHttpDocumentProcessor
IHttpDocumentProcessor - Interface in com.norconex.collector.http.processor
Custom processing (optional) performed on a document.
IHttpMetadataFetcher - Interface in com.norconex.collector.http.fetch
Fetches the HTTP Header, typically via a HEAD request.
ILinkExtractor - Interface in com.norconex.collector.http.url
Responsible for finding links in documents.
ImageCache - Class in com.norconex.collector.http.processor.impl
Caches images.
ImageCache(int, File) - Constructor for class com.norconex.collector.http.processor.impl.ImageCache
 
initCrawlData(ICrawlData, ICrawlData, ImporterDocument) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
IRecrawlableResolver - Interface in com.norconex.collector.http.recrawl
Indicates whether a document that was successfully crawled on a previous crawling session should be recrawled or not.
IRedirectURLProvider - Interface in com.norconex.collector.http.redirect
Responsible for providing a target absolute URL each time an HTTP redirect is encountered when invoking a URL.
IRobotsMetaProvider - Interface in com.norconex.collector.http.robot
Responsible for extracting robot information from a page.
IRobotsTxtFilter - Interface in com.norconex.collector.http.robot
Holds a robots.txt rule.
IRobotsTxtProvider - Interface in com.norconex.collector.http.robot
Given a URL, extract any "robots.txt" rules.
isAuthPreemptive() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Gets whether to perform preemptive authentication (valid for "basic" authentication method).
isCaseSensitive() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
isCaseSensitive() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
isCommentsEnabled() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Gets whether links should be extracted from HTML/XML comments.
isCookiesDisabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Whether cookie support is disabled.
isCurrentTimeInSchedule() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Gets whether character encoding is detected instead of relying on HTTP response header.
isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Gets whether content type is detected instead of relying on HTTP response header.
isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
isDisabled() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
Whether this checksummer is disabled or not.
isDisabled() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
Whether this URL Normalizer is disabled or not.
isDuplicate() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
isEscalateErrors() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Gets whether errors should be thrown instead of logged.
isEscalateErrors() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Gets whether errors should be thrown instead of logged.
isExpectContinueEnabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Whether 'Expect: 100-continue' handshake is enabled.
isHttpHeadFetchEnabled() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
isHttpHeadSuccessful() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
Gets whether http headers were already fetched successfully.
isIgnoreCanonicalLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
isIgnoreNofollow() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
isIgnoreNofollow() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
isIgnoreRobotsCrawlDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets whether to ignore crawl delays specified in a site robots.txt file.
isIgnoreRobotsMeta() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isIgnoreRobotsTxt() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isIgnoreSitemap() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Whether to ignore sitemap detection and resolving for URLs processed.
isIncludeSubdomains() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether sub-domains are considered to be the same as a URL domain.
isInScope(String, String) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
ISitemapResolver - Interface in com.norconex.collector.http.sitemap
Given a URL root, resolve the corresponding sitemap(s), if any, and only if it has not yet been resolved for a crawling session.
ISitemapResolverFactory - Interface in com.norconex.collector.http.sitemap
 
isKeepDownloads() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isKeepMaxDepthLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether to keep (and extract) links on pages having reached the configured maximum depth.
isKeepOutOfScopeLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Whether links not in scope should be stored as metadata under HttpMetadata.COLLECTOR_REFERENCED_URLS_OUT_OF_SCOPE
isKeepReferrerData() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Deprecated.
Since 2.6.0, referrer data is always kept
isKeepReferrerData() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
Deprecated.
Since 2.6.0, referrer data is always kept
isLargest() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
isLenient() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
isLenient() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
isNofollow() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
isNoindex() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
isRecrawlable(PreviousCrawlData) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
isRecrawlable(PreviousCrawlData) - Method in interface com.norconex.collector.http.recrawl.IRecrawlableResolver
Whether a document recrawlable or not.
isRedirected(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
 
isResolved(String) - Method in class com.norconex.collector.http.sitemap.impl.SitemapStore
 
isScaleStretch() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
isScreenshotEnabled() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets whether to enable taking screenshot of crawled web pages.
isScreenshotScaleStretch() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Gets whether the screenshot should be stretch to to fill all the scale dimensions.
isSkipMetaFetcherOnBadStatus() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether to skip metadata fetching activities instead of rejecting a document on bad status.
isStaleConnectionCheckDisabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Deprecated.
Since 2.1.0. As of 2.2.0, use GenericHttpClientFactory.getMaxConnectionInactiveTime() instead.
isStayOnDomain() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
isStayOnPort() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
isStayOnProtocol() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
IStartURLsProvider - Interface in com.norconex.collector.http.crawler
Provide starting URLs for crawling.
isTrustAllSSLCertificates() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Whether to trust all SSL certificates (affects only "https" connections).
IURLNormalizer - Interface in com.norconex.collector.http.url
Responsible for normalizing URLs.

J

JDBCCrawlDataSerializer - Class in com.norconex.collector.http.data.store.impl.jdbc
 
JDBCCrawlDataSerializer() - Constructor for class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
JDBCCrawlDataStoreFactory - Class in com.norconex.collector.http.data.store.impl.jdbc
JDBC implementation of ICrawlDataStore using H2.
JDBCCrawlDataStoreFactory() - Constructor for class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataStoreFactory
 

L

LastModifiedMetadataChecksummer - Class in com.norconex.collector.http.checksum.impl
Default implementation of IMetadataChecksummer for the Norconex HTTP Collector which simply returns the exact value of the "Last-Modified" HTTP header field, or null if not present.
LastModifiedMetadataChecksummer() - Constructor for class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
Link - Class in com.norconex.collector.http.url
Represents a link extracted from a document.
Link(String) - Constructor for class com.norconex.collector.http.url.Link
 
loadChecksummerFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
loadCollectorConfigFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.HttpCollectorConfig
 
loadCrawlerConfigFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
loadDelaysFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Loads explicit configuration of delays form XML.
loadDelaysFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
loadDelaysFromXML(XMLConfiguration) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
loadFromXML(Reader) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 

M

main(String[]) - Static method in class com.norconex.collector.http.HttpCollector
Invokes the HTTP Collector from the command line.
markReferenceVariationsAsProcessed(BaseCrawlData, ICrawlDataStore) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
markResolved(String) - Method in class com.norconex.collector.http.sitemap.impl.SitemapStore
 
matches(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
MinFrequency() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
MinFrequency(String, String, String) - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
MongoCrawlDataSerializer - Class in com.norconex.collector.http.data.store.impl.mongo
 
MongoCrawlDataSerializer() - Constructor for class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer
 
MongoCrawlDataStoreFactory - Class in com.norconex.collector.http.data.store.impl.mongo
Mongo implementation of ICrawlDataStoreFactory.
MongoCrawlDataStoreFactory() - Constructor for class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataStoreFactory
 

N

normalizeURL(String) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
normalizeURL(String) - Method in interface com.norconex.collector.http.url.IURLNormalizer
Normalize the given URL.

O

onCollectorFinish(ICollector) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
onCollectorStart(ICollector) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
OVERLAP_SIZE - Static variable in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
OVERLAP_SIZE - Static variable in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 

P

parseRobotsTxt(InputStream, String, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
PhantomJSDocumentFetcher - Class in com.norconex.collector.http.fetch.impl
An alternative to the GenericDocumentFetcher which relies on an external PhantomJS installation to fetch web pages.
PhantomJSDocumentFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
PhantomJSDocumentFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
PhantomJSDocumentFetcher.Quality - Enum in com.norconex.collector.http.fetch.impl
 
PhantomJSDocumentFetcher.Storage - Enum in com.norconex.collector.http.fetch.impl
 
PhantomJSDocumentFetcher.StorageDiskStructure - Enum in com.norconex.collector.http.fetch.impl
 
prepareExecution(JobStatusUpdater, JobSuite, ICrawlDataStore, boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
PreviousCrawlData - Class in com.norconex.collector.http.recrawl
Previously crawled data.
PreviousCrawlData() - Constructor for class com.norconex.collector.http.recrawl.PreviousCrawlData
 
processDocument(HttpClient, HttpDocument) - Method in interface com.norconex.collector.http.processor.IHttpDocumentProcessor
Processes a document.
processDocument(HttpClient, HttpDocument) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in interface com.norconex.collector.http.redirect.IRedirectURLProvider
Provides the redirect URL that the crawler must follow.
provideStartURLs() - Method in interface com.norconex.collector.http.crawler.IStartURLsProvider
Provides an iterator over start URLs.

R

REDIRECT - Static variable in class com.norconex.collector.http.data.HttpCrawlState
 
RedirectStrategyWrapper - Class in com.norconex.collector.http.redirect
This class is used by each crawler instance to wrap the original redirect strategy set on the HttpClient to make sure redirect target URLs are handled as required.
RedirectStrategyWrapper(RedirectStrategy, IRedirectURLProvider) - Constructor for class com.norconex.collector.http.redirect.RedirectStrategyWrapper
 
ReferenceDelayResolver - Class in com.norconex.collector.http.delay.impl
Introduces different delays between document downloads based on matching document reference (URL) patterns.
ReferenceDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
ReferenceDelayResolver.DelayReferencePattern - Class in com.norconex.collector.http.delay.impl
 
RegexLinkExtractor - Class in com.norconex.collector.http.url.impl
Link extractor using regular expressions to extract links found in text documents.
RegexLinkExtractor() - Constructor for class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
RegexPair(String, String, boolean) - Constructor for class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
REJECTED_CANONICAL - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
Deprecated.
REJECTED_NONCANONICAL - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_REDIRECTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_ROBOTS_META_NOINDEX - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_ROBOTS_TXT - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_TOO_DEEP - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
removeAuthFormParameter(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Remove the authentication form parameter matching the given name.
removeLinkTag(String, String) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
removeRequestHeader(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Remove the request header matching the given name.
Replace(String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
Replace(String, String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Resolves explicitly specified delay, in milliseconds.
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
resolveSitemaps(HttpClient, String, String[], SitemapURLAdder, boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
resolveSitemaps(HttpClient, String, String[], SitemapURLAdder, boolean) - Method in interface com.norconex.collector.http.sitemap.ISitemapResolver
Resolves the sitemap instructions for a URL "root" (e.g.
RobotsMeta - Class in com.norconex.collector.http.robot
 
RobotsMeta(boolean, boolean) - Constructor for class com.norconex.collector.http.robot.RobotsMeta
 
RobotsTxt - Class in com.norconex.collector.http.robot
 
RobotsTxt(IRobotsTxtFilter[]) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 
RobotsTxt(IRobotsTxtFilter[], float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 
RobotsTxt(IRobotsTxtFilter[], String[]) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 
RobotsTxt(IRobotsTxtFilter[], String[], float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 

S

saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
saveCollectorConfigToXML(Writer) - Method in class com.norconex.collector.http.HttpCollectorConfig
 
saveCrawlerConfigToXML(Writer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Saves explicit configuration of delays to XML.
saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
saveToXML(Writer) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
saveToXML(Writer) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
saveToXML(Writer) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
saveToXML(Writer) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
saveToXML(Writer) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
saveToXML(Writer) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
saveToXML(Writer) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
saveToXML(Writer) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
saveToXML(Writer) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
ScaledImage - Class in com.norconex.collector.http.processor.impl
 
ScaledImage(String, Dimension, BufferedImage) - Constructor for class com.norconex.collector.http.processor.impl.ScaledImage
 
SCOPE_CRAWLER - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
SCOPE_SITE - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
SCOPE_THREAD - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
SegmentCountURLFilter - Class in com.norconex.collector.http.filter.impl
Filters URL based based on the number of URL segments.
SegmentCountURLFilter() - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int, OnMatch) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int, OnMatch, boolean) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
setApplyTo(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setApplyToContentTypePattern(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
setApplyToContentTypePattern(String) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
setApplyToReferencePattern(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
setApplyToReferencePattern(String) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
setAuthDomain(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the NTLM authentication domain
setAuthFormCharset(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the authentication form character set for the form field values.
setAuthFormParam(String, String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
setAuthHostname(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the host name for the current authentication scope.
setAuthMethod(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the authentication method.
setAuthPassword(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the authentication password.
setAuthPasswordField(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the name of the HTML field where the password is set.
setAuthPasswordKey(EncryptionKey) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the authentication password encryption key.
setAuthPort(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the port for the current authentication scope.
setAuthPreemptive(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets whether to perform preemptive authentication (valid for "basic" authentication method).
setAuthRealm(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the realm name for the current authentication scope.
setAuthURL(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the URL for "form" authentication.
setAuthUsername(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the username.
setAuthUsernameField(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the name of the HTML field where the username is set.
setAuthWorkstation(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the NTLM authentication workstation name.
setCanonicalLinkDetector(ICanonicalLinkDetector) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the canonical link detector.
setCaseSensitive(boolean) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setCharset(String) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the character set of pages on which link extraction is performed.
setCharset(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Sets the character set of pages on which link extraction is performed.
setCommentsEnabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets whether links should be extracted from HTML/XML comments.
setConnectionCharset(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the connection character set.
setConnectionRequestTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the timeout when requesting a connection, in milliseconds.
setConnectionTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the connection timeout until a connection is established, in milliseconds.
setContentType(ContentType) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setContentTypePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
setCookiesDisabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets whether cookie support is disabled.
setCookieSpec(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
setCount(int) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setCrawlDate(Date) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setDefaultDelay(long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets the default delay in milliseconds.
setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern>) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
setDelayResolver(IDelayResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setDepth(int) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets the URL depth.
setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Sets whether character encoding is detected instead of relying on HTTP response header.
setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Sets whether content type is detected instead of relying on HTTP response header.
setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setDisabled(boolean) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
Sets whether this checksummer is disabled or not.
setDisabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
Sets whether this URL Normalizer is disabled or not.
setDocumentFetcher(IHttpDocumentFetcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setDomSelector(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setDuplicate(boolean) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setEscalateErrors(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Sets whether errors should be thrown instead of logged.
setEscalateErrors(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Sets whether errors should be thrown instead of logged.
setExePath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setExpectContinueEnabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets whether 'Expect: 100-continue' handshake is enabled.
setExtractBetweens(GenericLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the patterns delimiting the portions of a document to be considered for link extraction.
setExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the selectors matching the portions of a document to be considered for link extraction.
setFallbackCharset(String) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
setFileNamePrefix(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets the generated report file name prefix.
setFromDate(long) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Sets the minimum EPOCH date (in milliseconds) a sitemap entry should have to be considered.
setFromDate(long) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Sets the minimum EPOCH date (in milliseconds) a sitemap entry should have to be considered.
setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setHeadersPrefix(String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
setHttpClientFactory(IHttpClientFactory) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setHttpHeadSuccessful(boolean) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
Sets whether http headers were already fetched successfully.
setIgnoreCanonicalLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
setIgnoreRobotsCrawlDelay(boolean) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets whether to ignore crawl delays specified in a site robots.txt file.
setIgnoreRobotsMeta(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setIgnoreRobotsTxt(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setIgnoreSitemap(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to ignore sitemap detection and resolving for URLs processed.
setImage(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
setImageCacheDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setImageCacheSize(int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setImageFormat(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setIncludeSubdomains(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether sub-domains are considered to be the same as a URL domain.
setKeepDownloads(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setKeepMaxDepthLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to keep (and extract) links on pages having reached the configured maximum depth.
setKeepOutOfScopeLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether links not in scope should be stored as metadata under HttpMetadata.COLLECTOR_REFERENCED_URLS_OUT_OF_SCOPE
setKeepReferrerData(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Deprecated.
Since 2.6.0, referrer data is always kept
setKeepReferrerData(boolean) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
Deprecated.
Since 2.6.0, referrer data is always kept
setLargest(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
setLinkExtractors(ILinkExtractor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setLocalAddress(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the local address, which may be useful when working with multiple network interfaces.
setMaxConnectionIdleTime(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the period of time in milliseconds after which to evict idle connections from the connection pool.
setMaxConnectionInactiveTime(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
setMaxConnections(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets maximum number of connections that can be created.
setMaxConnectionsPerRoute(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the maximum number of connections that can be used per route.
setMaxDepth(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setMaxRedirects(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the maximum number of redirects to be followed.
setMaxURLLength(int) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the maximum supported URL length.
setMaxURLLength(int) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
Sets the maximum supported URL length.
setMetadataChecksummer(IMetadataChecksummer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setMetadataFetcher(IHttpMetadataFetcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setMinDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setMinDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setMinFrequencies(GenericRecrawlableResolver.MinFrequency...) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
setNoExtractBetweens(GenericLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the patterns delimiting the portions of a document to be excluded from link extraction.
setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the selectors matching the portions of a document to be excluded from link extraction.
setNofollowPatterns(List<String>) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the patterns of references for which link extraction is disabled.
setNormalizations(GenericURLNormalizer.Normalization...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
Sets HTTP status codes to be considered as "Not found" state.
setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
Sets HTTP status codes to be considered as "Not found" state.
setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setOptions(String...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setOriginalReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setOutputDir(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets the local directory where this listener report will be written.
setPageContentTypePattern(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setPattern(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setPort(int) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
 
setPostImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setPreImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setProxyHost(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy host.
setProxyPassword(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy password.
setProxyPasswordKey(EncryptionKey) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy password encryption key.
setProxyPort(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy port.
setProxyRealm(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy realm
setProxyScheme(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy scheme.
setProxyUsername(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the proxy username
setRecrawlableResolver(IRecrawlableResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the recrawlable resolver.
setRedirectTrail(String...) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets the trail of URLs that were redirected up to this one.
setRedirectURL(String) - Static method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
Sets the redirect URL.
setRedirectURLProvider(IRedirectURLProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the redirect URL provider
setReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setReference(String) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setReferencedUrls(String...) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets URLs referenced by this one.
setReferencePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setReferrer(String) - Method in class com.norconex.collector.http.url.Link
 
setReferrerLinkTag(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setReferrerLinkText(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setReferrerLinkTitle(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setReferrerReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
 
setRenderWaitTime(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setReplaces(GenericURLNormalizer.Replace...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setRequestHeader(String, String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets a default HTTP request header every HTTP connection should have.
setResourceTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
setRobotsMeta(RobotsMeta) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
setRobotsMetaProvider(IRobotsMetaProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setRobotsTxtProvider(IRobotsTxtProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setScaleDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleQuality(FeaturedImageProcessor.Quality) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleStretch(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setSchedules(List<GenericDelayResolver.DelaySchedule>) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
setSchemes(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
Sets the schemes to be extracted.
setScope(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets the delay scope.
setScreenshotDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setScreenshotDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setScreenshotDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
setScreenshotEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets whether to enable taking screenshot of crawled web pages.
setScreenshotImageFormat(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the screenshot image format (jpg, png, gif, bmp, etc.).
setScreenshotScaleDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the pixel dimensions we want the stored screenshot to have.
setScreenshotScaleDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the pixel dimensions we want the stored screenshot to have.
setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the screenshot scaling quality to use when when storage is "disk" or "inline".
setScreenshotScaleStretch(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets whether the screenshot should be stretch to to fill all the scale dimensions.
setScreenshotStorage(PhantomJSDocumentFetcher.Storage...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the screenshot storage mechanisms.
setScreenshotStorageDiskDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the directory where screenshots are saved when storage is "disk".
setScreenshotStorageDiskField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the screenshot directory structure to create when storage is "disk".
setScreenshotStorageInlineField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
setScreenshotZoomFactor(float) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setScriptPath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setSeparator(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets the sitemap change frequency.
setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setSitemapLastMod(Long) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets the sitemap last modified date in milliseconds (EPOCH date).
setSitemapLastMod(Long) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setSitemapLocations(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
setSitemapLocations(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
setSitemapPriority(Float) - Method in class com.norconex.collector.http.data.HttpCrawlData
Sets the sitemap priority.
setSitemapPriority(Float) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
setSitemapResolverFactory(ISitemapResolverFactory) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setSitemapSupport(GenericRecrawlableResolver.SitemapSupport) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Sets the sitemap support strategy.
setSkipMetaFetcherOnBadStatus(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to skip metadata fetching activities instead of rejecting a document on bad status.
setSocketTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.
setSSLProtocols(String...) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.
setStaleConnectionCheckDisabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Deprecated.
Since 2.1.0. As of 2.2.0, use GenericHttpClientFactory.setMaxConnectionInactiveTime(int) instead.
setStartSitemapURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the sitemap URLs used as starting points for crawling.
setStartURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setStartURLsFiles(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the file paths of seed files containing URLs to be used as "start URLs".
setStartURLsProviders(IStartURLsProvider...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the providers of URLs used as starting points for crawling.
setStatusCodes(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets a coma-separated list of status codes to listen to.
setStayOnDomain(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
setStayOnPort(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
setStayOnProtocol(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
setStorage(FeaturedImageProcessor.Storage...) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageDiskDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageDiskField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageDiskStructure(FeaturedImageProcessor.StorageDiskStructure) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageInlineField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageUrlField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setTag(String) - Method in class com.norconex.collector.http.url.Link
 
setTempDir(File) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
Sets the directory where temporary sitemap files are written.
setTempDir(File) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
Sets the temporary directory where sitemap files are temporary stored before they are parsed.
setText(String) - Method in class com.norconex.collector.http.url.Link
 
setTitle(String) - Method in class com.norconex.collector.http.url.Link
 
setTrustAllSSLCertificates(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
Sets whether to trust all SSL certificate.
setUrlCrawlScopeStrategy(URLCrawlScopeStrategy) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the strategy to use to determine if a URL is in scope.
setUrlNormalizer(IURLNormalizer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setUserAgent(String) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setValidExitCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
setValue(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
SiteDelay - Class in com.norconex.collector.http.delay.impl
 
SiteDelay() - Constructor for class com.norconex.collector.http.delay.impl.SiteDelay
 
SitemapChangeFrequency - Enum in com.norconex.collector.http.sitemap
Sitemap change frequency unit, as defined on http://www.sitemaps.org/protocol.html
SitemapStore - Class in com.norconex.collector.http.sitemap.impl
Sitemap store implementation used by StandardSitemapResolver.
SitemapStore(HttpCrawlerConfig, boolean) - Constructor for class com.norconex.collector.http.sitemap.impl.SitemapStore
 
SitemapURLAdder - Class in com.norconex.collector.http.sitemap
Represents a queue of sitemap URLs.
SitemapURLAdder() - Constructor for class com.norconex.collector.http.sitemap.SitemapURLAdder
 
StandardRobotsMetaProvider - Class in com.norconex.collector.http.robot.impl
Implementation of IRobotsMetaProvider as per X-Robots-Tag and ROBOTS standards.
StandardRobotsMetaProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
StandardRobotsTxtProvider - Class in com.norconex.collector.http.robot.impl
Implementation of IRobotsTxtProvider as per the robots.txt standard described at http://www.robotstxt.org/robotstxt.html.
StandardRobotsTxtProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
StandardSitemapResolver - Class in com.norconex.collector.http.sitemap.impl
Implementation of ISitemapResolver as per sitemap.xml standard defined at http://www.sitemaps.org/protocol.html.
StandardSitemapResolver(File, SitemapStore) - Constructor for class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
StandardSitemapResolverFactory - Class in com.norconex.collector.http.sitemap.impl
Factory used to created StandardSitemapResolver instances.
StandardSitemapResolverFactory() - Constructor for class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
stop(IJobStatus, JobSuite) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
stop() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
stop() - Method in interface com.norconex.collector.http.sitemap.ISitemapResolver
Stops any ongoing sitemap resolution.

T

ThreadDelay - Class in com.norconex.collector.http.delay.impl
 
ThreadDelay() - Constructor for class com.norconex.collector.http.delay.impl.ThreadDelay
 
TikaLinkExtractor - Class in com.norconex.collector.http.url.impl
Implementation of ILinkExtractor using Apache Tika to perform URL extractions from HTML documents.
TikaLinkExtractor() - Constructor for class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
TINY_SLEEP_MS - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelay
 
toCrawlData(String, ResultSet) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
 
toDocument(IMongoSerializer.Stage, ICrawlData) - Method in class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer
 
toHTMLInlineString(String) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
TOO_DEEP - Static variable in class com.norconex.collector.http.data.HttpCrawlState
 
toString() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
toString() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
 
toString() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
toString() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
toString() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
toString() - Method in class com.norconex.collector.http.data.HttpCrawlData
 
toString() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
toString() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
 
toString() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
 
toString() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
 
toString() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
 
toString() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
toString() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
toString() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
 
toString() - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
 
toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
toString() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
toString() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
toString() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
 
toString() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
toString() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
 
toString() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
 
toString() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
toString() - Method in class com.norconex.collector.http.url.Link
 

U

UNSPECIFIED_CRAWL_DELAY - Static variable in class com.norconex.collector.http.robot.RobotsTxt
 
URLCrawlScopeStrategy - Class in com.norconex.collector.http.crawler
By default a crawler will try to follow all links it discovers.
URLCrawlScopeStrategy() - Constructor for class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
URLS_EXTRACTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
URLStatusCrawlerEventListener - Class in com.norconex.collector.http.crawler.event.impl
Store on file all URLs that were "fetched", along with their HTTP response code, usually for reporting purposes (e.g.
URLStatusCrawlerEventListener() - Constructor for class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 

V

valueOf(String) - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
Returns an array containing the constants of this enum type, in the order they are declared.

W

wrapDocument(ICrawlData, ImporterDocument) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 

X

XMLFeedLinkExtractor - Class in com.norconex.collector.http.url.impl
Link extractor for extracting links out of RSS and Atom XML feeds.
XMLFeedLinkExtractor() - Constructor for class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
 
A B C D E F G H I J L M N O P R S T U V W X 
Skip navigation links

Copyright © 2009–2021 Norconex Inc.. All rights reserved.