Skip navigation links
A B C D E F G H I L M N O P R S T U V W X 

A

AbstractDelay - Class in com.norconex.collector.http.delay.impl
Convenience class to encapsulate various delay strategies.
AbstractDelay() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelay
 
AbstractDelayResolver - Class in com.norconex.collector.http.delay.impl
Base implementation for creating voluntary delays between URL downloads.
AbstractDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
AbstractHttpFetcher - Class in com.norconex.collector.http.fetch
Base class implementing the AbstractHttpFetcher.accept(Doc, HttpMethod) method using reference filters to determine if this fetcher will accept to fetch a URL and delegating the HTTP method check to its own AbstractHttpFetcher.accept(HttpMethod) abstract method.
AbstractHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
AbstractLinkExtractor - Class in com.norconex.collector.http.link
Base class for link extraction providing common configuration settings.
AbstractLinkExtractor() - Constructor for class com.norconex.collector.http.link.AbstractLinkExtractor
 
AbstractTextLinkExtractor - Class in com.norconex.collector.http.link
Base class for link extraction from text documents, providing common configuration settings such as being able to apply extraction to specific documents only, and being able to specify one or more metadata fields from which to grab the text for extracting links.
AbstractTextLinkExtractor() - Constructor for class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
accept(Event) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
accept(Doc, HttpMethod) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Whether the supplied HttpMethod is supported by this fetcher.
accept(Event) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
accept(Doc, HttpMethod) - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
 
accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
accept(Doc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
acceptDocument(Doc) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
acceptMetadata(String, Properties) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
acceptReference(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
addExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds patterns delimiting a portion of a document to be considered for link extraction.
addExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
addExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
addExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds selectors matching the portions of a document to be considered for link extraction.
addExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds selectors matching the portions of a document to be considered for link extraction.
addLinkSelector(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Adds a new link selector extracting the "text" from matches.
addLinkSelector(String, String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
addLinkTag(String, String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
addNoExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds patterns delimiting a portion of a document to be excluded from link extraction.
addNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
addNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
addNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds selectors matching the portions of a document to be excluded from link extraction.
addNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Adds selectors matching the portions of a document to be excluded from link extraction.
addPattern(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
addPattern(String, String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Adds a URL pattern, with an optional replacement.
addRedirectURL(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Adds a redirect URL to the trail of URLs that were redirected so far.
addResponse(IHttpFetchResponse, IHttpFetcher) - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
addRestriction(PropertyMatcher...) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Adds one or more restrictions this extractor should be restricted to.
addRestrictions(List<PropertyMatcher>) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Adds restrictions this extractor should be restricted to.
afterCrawlerExecution() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
ApacheHttpUtil - Class in com.norconex.collector.http.fetch.util
Utility methods for fetcher implementations using Apache HttpClient.
ApacheRedirectCaptureStrategy - Class in com.norconex.collector.http.fetch.util
This class is used by each crawler instance to capture the closest redirect target whether it is part of a redirect chain or not.
ApacheRedirectCaptureStrategy(IRedirectURLProvider) - Constructor for class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
 
applyContentTypeAndCharset(String, CrawlDocInfo) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Applies the Content-Type HTTP response header on the supplied document info.
applyResponseContent(HttpResponse, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Applies the HTTP response content to a document if such content exists.
applyResponseHeaders(HttpResponse, String, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Applies the HTTP response headers to a document.
AUTH_METHOD_BASIC - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
BASIC authentication method.
AUTH_METHOD_DIGEST - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
DIGEST authentication method.
AUTH_METHOD_FORM - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
Form-based authentication method.
AUTH_METHOD_KERBEROS - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
Experimental: Kerberos authentication method.
AUTH_METHOD_NTLM - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
NTLM authentication method.
AUTH_METHOD_SPNEGO - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
Experimental: SPNEGO authentication method.
authenticateUsingForm(HttpClient) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
authenticateUsingForm(HttpClient, HttpAuthConfig) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
 

B

beforeCrawlerExecution(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
beforeFinalizeDocumentProcessing(CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
Browser - Enum in com.norconex.collector.http.fetch.impl.webdriver
 
buildCustomHttpClient(HttpClientBuilder) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
For implementors to subclass.

C

checkClientTrusted(X509Certificate[], String) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
checkClientTrusted(X509Certificate[], String, Socket) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
checkClientTrusted(X509Certificate[], String, SSLEngine) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
checkServerTrusted(X509Certificate[], String) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
checkServerTrusted(X509Certificate[], String, Socket) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
checkServerTrusted(X509Certificate[], String, SSLEngine) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
clearLinkSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
clearLinkTags() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
clearPatterns() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
clearRestrictions() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Clears all restrictions.
COLLECTOR_FEATURED_IMAGE_INLINE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_FEATURED_IMAGE_PATH - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_FEATURED_IMAGE_URL - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
COLLECTOR_PHANTOMJS_SCREENSHOT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
com.norconex.collector.http - package com.norconex.collector.http
 
com.norconex.collector.http.canon - package com.norconex.collector.http.canon
 
com.norconex.collector.http.canon.impl - package com.norconex.collector.http.canon.impl
 
com.norconex.collector.http.checksum.impl - package com.norconex.collector.http.checksum.impl
 
com.norconex.collector.http.crawler - package com.norconex.collector.http.crawler
 
com.norconex.collector.http.crawler.event.impl - package com.norconex.collector.http.crawler.event.impl
 
com.norconex.collector.http.delay - package com.norconex.collector.http.delay
 
com.norconex.collector.http.delay.impl - package com.norconex.collector.http.delay.impl
 
com.norconex.collector.http.doc - package com.norconex.collector.http.doc
 
com.norconex.collector.http.fetch - package com.norconex.collector.http.fetch
 
com.norconex.collector.http.fetch.impl - package com.norconex.collector.http.fetch.impl
 
com.norconex.collector.http.fetch.impl.webdriver - package com.norconex.collector.http.fetch.impl.webdriver
 
com.norconex.collector.http.fetch.util - package com.norconex.collector.http.fetch.util
 
com.norconex.collector.http.filter.impl - package com.norconex.collector.http.filter.impl
 
com.norconex.collector.http.link - package com.norconex.collector.http.link
 
com.norconex.collector.http.link.impl - package com.norconex.collector.http.link.impl
 
com.norconex.collector.http.pipeline.committer - package com.norconex.collector.http.pipeline.committer
 
com.norconex.collector.http.pipeline.importer - package com.norconex.collector.http.pipeline.importer
 
com.norconex.collector.http.pipeline.queue - package com.norconex.collector.http.pipeline.queue
 
com.norconex.collector.http.processor - package com.norconex.collector.http.processor
 
com.norconex.collector.http.processor.impl - package com.norconex.collector.http.processor.impl
 
com.norconex.collector.http.recrawl - package com.norconex.collector.http.recrawl
 
com.norconex.collector.http.recrawl.impl - package com.norconex.collector.http.recrawl.impl
 
com.norconex.collector.http.robot - package com.norconex.collector.http.robot
 
com.norconex.collector.http.robot.impl - package com.norconex.collector.http.robot.impl
 
com.norconex.collector.http.sitemap - package com.norconex.collector.http.sitemap
 
com.norconex.collector.http.sitemap.impl - package com.norconex.collector.http.sitemap.impl
 
com.norconex.collector.http.url - package com.norconex.collector.http.url
 
com.norconex.collector.http.url.impl - package com.norconex.collector.http.url.impl
 
compareTo(Link) - Method in class com.norconex.collector.http.link.Link
 
contains(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
contains(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
contains(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
CrawlerDelay - Class in com.norconex.collector.http.delay.impl
It is assumed there will be one instance of this class per crawler defined.
CrawlerDelay() - Constructor for class com.norconex.collector.http.delay.impl.CrawlerDelay
 
create() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
createChildDocInfo(String, CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
createConnectionConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createCrawler(CrawlerConfig) - Method in class com.norconex.collector.http.HttpCollector
 
createCredentialsProvider() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
CREATED_ROBOTS_META - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
createDefaultCookieStore() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
Creates the default cookie store to be added to each request context.
createDefaultRequestHeaders() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
Creates a list of HTTP headers based on configuration.
createHttpClient() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createProxy() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createRequestConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createSchemePortResolver() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createSSLContext() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createSSLSocketFactory(SSLContext) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
createUriRequest(String, String) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Creates an HTTP request.
createUriRequest(String, HttpMethod) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Creates an HTTP request.

D

DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
DEFAULT_DELAY - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Default delay is 3 seconds.
DEFAULT_FALLBACK_CHARSET - Static variable in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
DEFAULT_FILENAME_PREFIX - Static variable in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
DEFAULT_IMAGE_CACHE_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_IMAGE_CACHE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.fetch.util.DocImageHandler
 
DEFAULT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
DEFAULT_MAX_CONNECTIONS - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_MAX_CONNECTIONS_PER_ROUTE - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_MAX_IDLE_TIME - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_MAX_REDIRECT - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Default maximum length a URL can have.
DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Default maximum length a URL can have.
DEFAULT_MIN_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_NOT_FOUND_STATUS_CODES - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_PAGE_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_RENDER_WAIT_TIME - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCALE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_SCREENSHOT_DIR - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
DEFAULT_SCREENSHOT_DIR_FIELD - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
DEFAULT_SCREENSHOT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCREENSHOT_META_FIELD - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
DEFAULT_SCREENSHOT_SCALE_SIZE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCREENSHOT_STORAGE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCREENSHOT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCREENSHOT_ZOOM_FACTOR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SCRIPT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
DEFAULT_SEGMENT_COUNT - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Default segment count.
DEFAULT_SEGMENT_SEPARATOR_PATTERN - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Default segment separator pattern.
DEFAULT_SITEMAP_PATHS - Static variable in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
DEFAULT_STORAGE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_STORAGE_DISK_STRUCTURE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
DEFAULT_TIMEOUT - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
DEFAULT_TYPES - Static variable in class com.norconex.collector.http.fetch.util.DocImageHandler
 
DEFAULT_VALID_STATUS_CODES - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
delay(RobotsTxt, String) - Method in interface com.norconex.collector.http.delay.IDelayResolver
Delay crawling activities (if applicable).
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
 
delay(long, long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
 
delay(RobotsTxt, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.CrawlerDelay
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.SiteDelay
 
delay(long, String) - Method in class com.norconex.collector.http.delay.impl.ThreadDelay
 
DelayReferencePattern(String, long) - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
DelaySchedule(String, String, String, long) - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
DEPTH - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
detectFromContent(String, InputStream, ContentType) - Method in interface com.norconex.collector.http.canon.ICanonicalLinkDetector
Detects from a document content the presence of a canonical URL.
detectFromContent(String, InputStream, ContentType) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
detectFromMetadata(String, Properties) - Method in interface com.norconex.collector.http.canon.ICanonicalLinkDetector
Detects from metadata gathered so far, which when invoked, is normally the HTTP header values.
detectFromMetadata(String, Properties) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
DocImageHandler - Class in com.norconex.collector.http.fetch.util
Handles images associated with a document (which is different than a document being itself an image).
DocImageHandler(Path, String, String) - Constructor for class com.norconex.collector.http.fetch.util.DocImageHandler
 
DocImageHandler() - Constructor for class com.norconex.collector.http.fetch.util.DocImageHandler
 
DocImageHandler.DirStructure - Enum in com.norconex.collector.http.fetch.util
 
DocImageHandler.Target - Enum in com.norconex.collector.http.fetch.util
 
doCreateMetaChecksum(Properties) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
DOMLinkExtractor - Class in com.norconex.collector.http.link.impl
Extracts links from a Document Object Model (DOM) representation of an HTML, XHTML, or XML document content based on values of matching elements and attributes.
DOMLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.DOMLinkExtractor
 

E

equals(Object) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
equals(Object) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
equals(Object) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
equals(Object) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
equals(Object) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
equals(Object) - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
equals(Object) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
equals(Object) - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
equals(Object) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
equals(Object) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
equals(Object) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
equals(Object) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
equals(Object) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 
equals(Object) - Method in class com.norconex.collector.http.link.Link
 
equals(Object) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
equals(Object) - Method in class com.norconex.collector.http.robot.RobotsMeta
 
equals(Object) - Method in class com.norconex.collector.http.robot.RobotsTxt
 
equals(Object) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
executeCommitterPipeline(Crawler, CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
executeImporterPipeline(ImporterPipelineContext) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
executeQueuePipeline(CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
extractLinks(CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
extractLinks(CrawlDoc) - Method in interface com.norconex.collector.http.link.ILinkExtractor
 
extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 

F

FeaturedImageProcessor - Class in com.norconex.collector.http.processor.impl
Document processor that extract the "main" image from HTML pages.
FeaturedImageProcessor() - Constructor for class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
FeaturedImageProcessor.Quality - Enum in com.norconex.collector.http.processor.impl
 
FeaturedImageProcessor.Storage - Enum in com.norconex.collector.http.processor.impl
 
FeaturedImageProcessor.StorageDiskStructure - Enum in com.norconex.collector.http.processor.impl
 
fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.HttpFetchClient
 
fetch(CrawlDoc, HttpMethod) - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
Performs an HTTP request for the supplied document reference and HTTP method.
fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fetchDocumentContent(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Invoked once per fetcher when the collector ends.
fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Invoked once per fetcher instance, when the collector starts.
fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fetcherThreadBegin(HttpCrawler) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Invoked each time a crawler begins a new crawler thread if that thread is the current thread.
fetcherThreadBegin(HttpCrawler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fetcherThreadEnd(HttpCrawler) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread.
fetcherThreadEnd(HttpCrawler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
fits(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
fits(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
fits(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 

G

GenericCanonicalLinkDetector - Class in com.norconex.collector.http.canon.impl
Generic canonical link detector.
GenericCanonicalLinkDetector() - Constructor for class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
GenericDelayResolver - Class in com.norconex.collector.http.delay.impl
Default implementation for creating voluntary delays between URL downloads.
GenericDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
GenericDelayResolver.DelaySchedule - Class in com.norconex.collector.http.delay.impl
 
GenericDelayResolver.DelaySchedule.DOW - Enum in com.norconex.collector.http.delay.impl
 
GenericHttpFetcher - Class in com.norconex.collector.http.fetch.impl
Default implementation of IHttpFetcher, based on Apache HttpClient.
GenericHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
GenericHttpFetcher(GenericHttpFetcherConfig) - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
GenericHttpFetcherConfig - Class in com.norconex.collector.http.fetch.impl
Generic HTTP Fetcher configuration.
GenericHttpFetcherConfig() - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
GenericLinkExtractor - Class in com.norconex.collector.http.link.impl
Deprecated.
Since 3.0.0, use HtmlLinkExtractor or DOMLinkExtractor instead.
GenericLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.GenericLinkExtractor
Deprecated.
 
GenericRecrawlableResolver - Class in com.norconex.collector.http.recrawl.impl
Relies on both sitemap directives and custom instructions for establishing the minimum frequency between each document recrawl.
GenericRecrawlableResolver() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
GenericRecrawlableResolver.MinFrequency - Class in com.norconex.collector.http.recrawl.impl
 
GenericRecrawlableResolver.SitemapSupport - Enum in com.norconex.collector.http.recrawl.impl
 
GenericRedirectURLProvider - Class in com.norconex.collector.http.fetch.util
Provide redirect URLs by grabbing them from the HTTP Response Location header value.
GenericRedirectURLProvider() - Constructor for class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
GenericSitemapResolver - Class in com.norconex.collector.http.sitemap.impl
Implementation of ISitemapResolver as per sitemap.xml standard defined at http://www.sitemaps.org/protocol.html.
GenericSitemapResolver() - Constructor for class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
GenericURLNormalizer - Class in com.norconex.collector.http.url.impl
Generic implementation of IURLNormalizer that should satisfy most URL normalization needs.
GenericURLNormalizer() - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
GenericURLNormalizer.Normalization - Enum in com.norconex.collector.http.url.impl
 
GenericURLNormalizer.Replace - Class in com.norconex.collector.http.url.impl
 
getAcceptedIssuers() - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 
getAllowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets "Allow" filters.
getApplyTo() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
getArea() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getAuthConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getBrowser() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getBrowserPath() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getCachedDocInfo() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getCacheDirectory() - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
getCanonicalLinkDetector() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the canonical link detector.
getCapabilities() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getChangeFrequency(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Gets the sitemap change frequency matching the supplied string.
getCharset() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Gets the assumed source character encoding.
getCharset() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the character set of pages on which link extraction is performed.
getCharset() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Gets the character set of pages on which link extraction is performed.
getCollectorConfig() - Method in class com.norconex.collector.http.HttpCollector
 
getConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
getConfig() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
getConfig() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getConfig() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getConfig() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getConnectionCharset() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the connection character set.
getConnectionRequestTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the timeout when requesting a connection, in milliseconds
getConnectionTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the connection timeout until a connection is established, in milliseconds.
getContentTypePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getContentTypes() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
getCookieSpec() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getCount() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
getCrawlDelay() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
getCrawlDocInfoType() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getCrawler() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getCrawlerConfig() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getCrawlerIds() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
getCrawlState() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getCrawlState() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getCredentials() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
getCssSelector() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
getDayOfMonthRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDayOfWeekRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDedupDocumentStore() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getDedupMetadataStore() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getDefaultDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets the default delay in milliseconds.
getDelay() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getDelay() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
getDelayReferencePatterns() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
getDelayResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getDepth() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the URL depth.
getDisallowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets "Disallow" filters.
getDocInfo() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getDocInfo() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getDocInfo() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getDomain() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the NTLM authentication domain.
getDomSelector() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getDriverPath() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getEarlyPageScript() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getEnd() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
getEtag() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the HTTP ETag.
getException() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getException() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getExePath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getExtractBetweens() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the patterns delimiting the portions of a document to be considered for link extraction.
getExtractSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
getExtractSelectors() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the selectors matching the portions of a document to be considered for link extraction.
getFallbackCharset() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
getFetchHttpGet() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether to fetch HTTP documents using an HTTP GET request.
getFetchHttpHead() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether to fetch HTTP response headers using an HTTP HEAD request.
getFieldMatcher() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
Gets field matcher identifying fields holding content used for link extraction.
getFileNamePrefix() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the generated report file name prefix.
getFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
Gets all filters.
getFormCharset() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the authentication form character set.
getFormParam(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
getFormParamNames() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets all authentication form parameter names.
getFormParams() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets all authentication form parameters (equivalent to "input" or other fields in HTML forms).
getFormPasswordField() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the name of the HTML field where the password is set.
getFormSelector() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the CSS selelector that identifies the form in a login page.
getFormUsernameField() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the name of the HTML field where the username is set.
getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getHeadersPrefix() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
getHost() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the host for the current authentication scope.
getHttpClient() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
getHttpFetchClient() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getHttpFetchClient() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
getHttpFetchClient() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getHttpFetchers() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets HTTP fetchers.
getHttpFetchersMaxRetries() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the maximum number of times an HTTP fetcher will re-attempt fetching a resource in case of failures.
getHttpFetchersRetryDelay() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets how long to wait before a failing HTTP fetcher re-attempts fetching a resource in case of failures (in milliseconds).
getHttpMethods() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the list of HTTP methods to be accepted by this fetcher.
getHttpSnifferConfig() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getImage(String) - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
getImage() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getImageCacheDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImageCacheSize() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImageFormat() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getImageFormat() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getImplicitlyWait() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getImporter() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getKeepReferencedLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets what type of referenced links to keep, if any.
getLatePageScript() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getLinkExtractors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets link extractors.
getLocalAddress() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the local address (IP or hostname).
getMatch() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
getMaxBufferSize() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
getMaxConnectionIdleTime() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.
getMaxConnectionInactiveTime() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
getMaxConnections() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the maximum number of connections that can be created.
getMaxConnectionsPerRoute() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the maximum number of connections that can be used per route.
getMaxDepth() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getMaxRedirects() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the maximum number of redirects to be followed.
getMaxURLLength() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the maximum supported URL length.
getMaxURLLength() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Gets the maximum supported URL length.
getMetadata() - Method in class com.norconex.collector.http.link.Link
 
getMetadata() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getMethod() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the authentication method.
getMinDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getMinFrequencies() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Gets minimum frequencies.
getNoExtractBetweens() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the patterns delimiting the portions of a document to be excluded from link extraction.
getNoExtractSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
getNoExtractSelectors() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the selectors matching the portions of a document to be excluded from link extraction.
getNormalizations() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets HTTP status codes to be considered as "Not found" state.
getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets HTTP status codes to be considered as "Not found" state.
getOnMatch() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
getOptions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getOriginalReference() - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
getOriginalSize() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getOutputDir() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the local directory where this listener report will be written.
getPageContentTypePattern() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getPageLoadTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getParser() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Gets the parser to use when creating the DOM-tree.
getPath() - Method in interface com.norconex.collector.http.robot.IRobotsTxtFilter
 
getPattern() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
getPattern() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
getPatternReplacement(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Gets a pattern replacement.
getPatterns() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
getPort() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
getPostImportLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets a field matcher used to identify post-import metadata fields holding URLs to consider for crawling.
getPostImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets post-import processors.
getPreImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets pre-import processors.
getProxySettings() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getRealm() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the realm name for the current authentication scope.
getReasonPhrase() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getReasonPhrase() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getRecrawlableResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the recrawlable resolver.
getRedirectTarget() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getRedirectTarget() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getRedirectTarget(HttpContext) - Static method in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
 
getRedirectTrail() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the trail of URLs that were redirected up to this one.
getRedirectURLProvider() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the redirect URL provider.
getReferencedUrls() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets URLs referenced by this one.
getReferenceFilters() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Gets reference filters
getReferencePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getReferrer() - Method in class com.norconex.collector.http.link.Link
 
getReferrerLinkMetadata() - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
getReferrerReference() - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
getRemoteURL() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getRenderWaitTime() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getReplacement() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
getReplaces() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
getRequestHeader(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the HTTP request header value matching the given name, previously set with GenericHttpFetcherConfig.setRequestHeader(String, String).
getRequestHeaderNames() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets all HTTP request header names for headers previously set with GenericHttpFetcherConfig.setRequestHeader(String, String).
getRequestHeaders() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
getResourceTimeout() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
getResponses() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getRestrictions() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Gets all restrictions
getRobotsMeta() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getRobotsMeta(Reader, String, ContentType, Properties) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
getRobotsMeta(Reader, String, ContentType, Properties) - Method in interface com.norconex.collector.http.robot.IRobotsMetaProvider
Extracts Robots meta information for a page, if any.
getRobotsMetaProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getRobotsTxt(HttpFetchClient, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
getRobotsTxt(HttpFetchClient, String) - Method in interface com.norconex.collector.http.robot.IRobotsTxtProvider
Gets robots.txt rules.
getRobotsTxtProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getScaleDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getScaleQuality() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getSchedules() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
getSchemes() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Gets the schemes to be extracted.
getSchemes() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets the schemes to be extracted.
getScope() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets the delay scope.
getScreenshotDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getScreenshotHandler() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
getScreenshotImageFormat() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the screenshot image format (jpg, png, gif, bmp, etc.).
getScreenshotScaleDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the pixel dimensions we want the stored screenshot to have.
getScreenshotScaleQuality() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the screenshot scaling quality to use when when storage is "disk" or "inline".
getScreenshotStorage() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the screenshot storage mechanisms.
getScreenshotStorageDiskDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the directory where screenshots are saved when storage is "disk".
getScreenshotStorageDiskField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
getScreenshotStorageDiskStructure() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the screenshot directory structure to create when storage is "disk".
getScreenshotStorageInlineField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
getScreenshotZoomFactor() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getScriptPath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getScriptTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getSeparator() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Gets the segment separator pattern
getSitemapChangeFreq() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the sitemap change frequency.
getSitemapLastMod() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the sitemap last modified date.
getSitemapLocations() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
Gets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
getSitemapPriority() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the sitemap priority.
getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
getSitemapSupport() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Gets the sitemap support strategy.
getSitemapSupport(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
 
getSocketTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.
getSSLProtocols() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets the supported SSL/TLS protocols.
getStart() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
getStartSitemapURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets sitemap URLs to be used as starting points for crawling.
getStartURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets URLs to initiate crawling from.
getStartURLsFiles() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the file paths of seed files containing URLs to be used as "start URLs".
getStartURLsProviders() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the providers of URLs used as starting points for crawling.
getStatusCode() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getStatusCode() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getStatusCodes() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets the status codes to listen for.
getStorage() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
Gets the storage mechanisms.
getStorageDiskDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageDiskField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageDiskStructure() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageInlineField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStorageUrlField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
getStreamFactory() - Method in class com.norconex.collector.http.fetch.HttpFetchClient
 
getTargetDir() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getTargetDirField() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getTargetDirStructure() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getTargetMetaField() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getTargets() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
Gets the directory where temporary sitemap files are written.
getThreadWait() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getTimeRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
getUrl() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the URL for "form" authentication.
getUrl() - Method in class com.norconex.collector.http.link.Link
 
getUrl() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
getURLCrawlScopeStrategy() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets the strategy to use to determine if a URL is in scope.
getUrlNormalizer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
getUrlRoot() - Method in class com.norconex.collector.http.doc.HttpDocInfo
Gets the URL root (protocol + domain, e.g.
getUserAgent() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
getUserAgent() - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
 
getUserAgent() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
 
getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
getValidExitCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets valid PhantomJS exit values (defaults to 0).
getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
getValue() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
getWaitForElementSelector() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getWaitForElementTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getWaitForElementType() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getWebDriver() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
getWindowSize() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
getWorkstation() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets the NTLM authentication workstation name.

H

handleImage(InputStream, Doc) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
hashCode() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
hashCode() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
hashCode() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
hashCode() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
hashCode() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
hashCode() - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
hashCode() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
hashCode() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
hashCode() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
hashCode() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
hashCode() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
hashCode() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
hashCode() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
hashCode() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 
hashCode() - Method in class com.norconex.collector.http.link.Link
 
hashCode() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
hashCode() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
hashCode() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
hashCode() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
HstsResolver - Class in com.norconex.collector.http.fetch.util
Class handling HSTS support for servers supporting it.
HtmlLinkExtractor - Class in com.norconex.collector.http.link.impl
Html link extractor for URLs found in HTML and possibly other text files.
HtmlLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
HtmlLinkExtractor.RegexPair - Class in com.norconex.collector.http.link.impl
 
HTTP_FETCHER - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
HttpAuthConfig - Class in com.norconex.collector.http.fetch.impl
Generic HTTP Fetcher authentication configuration.
HttpAuthConfig() - Constructor for class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
HttpCollector - Class in com.norconex.collector.http
Main application class.
HttpCollector() - Constructor for class com.norconex.collector.http.HttpCollector
Creates a non-configured HTTP collector.
HttpCollector(HttpCollectorConfig) - Constructor for class com.norconex.collector.http.HttpCollector
Creates and configure an HTTP Collector with the provided configuration.
HttpCollectorConfig - Class in com.norconex.collector.http
HTTP Collector configuration.
HttpCollectorConfig() - Constructor for class com.norconex.collector.http.HttpCollectorConfig
 
HttpCommitterPipeline - Class in com.norconex.collector.http.pipeline.committer
 
HttpCommitterPipeline() - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipeline
 
HttpCommitterPipelineContext - Class in com.norconex.collector.http.pipeline.committer
 
HttpCommitterPipelineContext(HttpCrawler, CrawlDoc) - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
 
HttpCrawler - Class in com.norconex.collector.http.crawler
The HTTP Crawler.
HttpCrawler(HttpCrawlerConfig, HttpCollector) - Constructor for class com.norconex.collector.http.crawler.HttpCrawler
Constructor.
HttpCrawlerConfig - Class in com.norconex.collector.http.crawler
HTTP Crawler configuration.
HttpCrawlerConfig() - Constructor for class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
HttpCrawlerConfig.HttpMethodSupport - Enum in com.norconex.collector.http.crawler
 
HttpCrawlerConfig.ReferencedLinkType - Enum in com.norconex.collector.http.crawler
 
HttpCrawlerEvent - Class in com.norconex.collector.http.crawler
HTTP Crawler event names.
HttpCrawlState - Class in com.norconex.collector.http.doc
Represents a URL crawling status.
HttpCrawlState(String) - Constructor for class com.norconex.collector.http.doc.HttpCrawlState
 
HttpDocInfo - Class in com.norconex.collector.http.doc
A URL being crawled holding relevant crawl information.
HttpDocInfo() - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
 
HttpDocInfo(String) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
 
HttpDocInfo(String, int) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
Constructor.
HttpDocInfo(DocInfo) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
Copy constructor.
HttpDocMetadata - Class in com.norconex.collector.http.doc
Metadata constants for common metadata field names typically set by the HTTP Collector crawler.
HttpFetchClient - Class in com.norconex.collector.http.fetch
Fetches HTTP resources, trying all configured http fetchers, defaulting to GenericHttpFetcher with default configuration if none are defined.
HttpFetchClient(CachedStreamFactory, List<IHttpFetcher>, int, long) - Constructor for class com.norconex.collector.http.fetch.HttpFetchClient
 
HttpFetchClientResponse - Class in com.norconex.collector.http.fetch
Hold HTTP response information obtained from fetching a document using HttpFetchClient.
HttpFetchClientResponse() - Constructor for class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
HttpFetchException - Exception in com.norconex.collector.http.fetch
Checked exception thrown upon encountering an error performing an HTTP Fetch
HttpFetchException() - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
 
HttpFetchException(String, Throwable) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
 
HttpFetchException(String) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
 
HttpFetchException(Throwable) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
 
HttpFetchResponseBuilder - Class in com.norconex.collector.http.fetch
Builder facilitating creation of an HTTP fetch response.
HttpFetchResponseBuilder() - Constructor for class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
HttpFetchResponseBuilder(IHttpFetchResponse) - Constructor for class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
HttpImporterPipeline - Class in com.norconex.collector.http.pipeline.importer
All execution steps of a document processing from the moment it is obtained from queue up to importing it.
HttpImporterPipeline(boolean, boolean) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipeline
 
HttpImporterPipelineContext - Class in com.norconex.collector.http.pipeline.importer
 
HttpImporterPipelineContext(ImporterPipelineContext) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
Constructor creating a copy of supplied context.
HttpImporterPipelineContext(HttpCrawler, CrawlDoc) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
HttpMethod - Enum in com.norconex.collector.http.fetch
 
HttpQueuePipeline - Class in com.norconex.collector.http.pipeline.queue
Performs a URL handling logic before actual processing of the document it represents takes place.
HttpQueuePipeline() - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipeline
 
HttpQueuePipelineContext - Class in com.norconex.collector.http.pipeline.queue
 
HttpQueuePipelineContext(HttpCrawler, HttpDocInfo) - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
 
HttpSnifferConfig - Class in com.norconex.collector.http.fetch.impl.webdriver
Configuration for HttpSniffer.
HttpSnifferConfig() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 

I

ICanonicalLinkDetector - Interface in com.norconex.collector.http.canon
Detects and return any canonical URL found in documents, whether from the HTTP headers (metadata), or from a page content (usually HTML).
IDelayResolver - Interface in com.norconex.collector.http.delay
Resolves and creates intentional "delays" to increase document download time intervals.
IHttpDocumentProcessor - Interface in com.norconex.collector.http.processor
Custom processing (optional) performed on a document.
IHttpFetcher - Interface in com.norconex.collector.http.fetch
Fetches HTTP resources.
IHttpFetchResponse - Interface in com.norconex.collector.http.fetch
 
ILinkExtractor - Interface in com.norconex.collector.http.link
Responsible for finding links in documents.
ImageCache - Class in com.norconex.collector.http.processor.impl
Caches images.
ImageCache(int, Path) - Constructor for class com.norconex.collector.http.processor.impl.ImageCache
 
initCrawlDoc(CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
IRecrawlableResolver - Interface in com.norconex.collector.http.recrawl
Indicates whether a document that was successfully crawled on a previous crawling session should be recrawled or not.
IRedirectURLProvider - Interface in com.norconex.collector.http.fetch.util
Responsible for providing a target absolute URL each time an HTTP redirect is encountered when invoking a URL.
IRobotsMetaProvider - Interface in com.norconex.collector.http.robot
Responsible for extracting robot information from a page.
IRobotsTxtFilter - Interface in com.norconex.collector.http.robot
Holds a robots.txt rule.
IRobotsTxtProvider - Interface in com.norconex.collector.http.robot
Given a URL, extract any "robots.txt" rules.
is(HttpCrawlerConfig.HttpMethodSupport) - Method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
 
is(HttpMethod) - Method in enum com.norconex.collector.http.fetch.HttpMethod
 
isAny(HttpMethod...) - Method in enum com.norconex.collector.http.fetch.HttpMethod
 
isCaseSensitive() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
isCaseSensitive() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
isCombined() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
isCommentsEnabled() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets whether links should be extracted from HTML/XML comments.
isCurrentTimeInSchedule() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
isDisabled() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
Deprecated.
Since 2.0.0, not having a checksummer defined or setting one explicitly to null effectively disables it.
isDisabled() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
Whether this URL Normalizer is disabled or not.
isDisableETag() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether adding "ETag" If-None-Match HTTP request header is disabled.
isDisableHSTS() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain Strict-Transport-Security policy (obtained from HTTP response header).
isDisableIfModifiedSince() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether adding the If-Modified-Since HTTP request header is disabled.
isDisableSNI() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether Server Name Indication (SNI) is disabled.
isDuplicate() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
isEnabled(HttpCrawlerConfig.HttpMethodSupport) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
 
isExpectContinueEnabled() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Whether 'Expect: 100-continue' handshake is enabled.
isFetchHttpHead() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
isForceCharsetDetection() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether character encoding is detected instead of relying on HTTP response header.
isForceContentTypeDetection() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets whether content type is detected instead of relying on HTTP response header.
isIgnoreCanonicalLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Gets whether to ignore extra data associated with a link.
isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Gets whether to ignore extra data associated with a link.
isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
Gets whether to ignore extra data associated with a link.
isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
isIgnoreRobotsCrawlDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Gets whether to ignore crawl delays specified in a site robots.txt file.
isIgnoreRobotsMeta() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isIgnoreRobotsTxt() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isIgnoreSitemap() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Whether to ignore sitemap detection and resolving for URLs processed.
isIncludeSubdomains() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether sub-domains are considered to be the same as a URL domain.
isInScope(String, String) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
ISitemapResolver - Interface in com.norconex.collector.http.sitemap
Given a URL root, resolve the corresponding sitemap(s), if any, and only if it has not yet been resolved for a crawling session.
isKeepDownloads() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
isKeepOutOfScopeLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Deprecated.
isLargest() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
isLenient() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
isNofollow() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
isNoindex() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
isPostImportLinksKeep() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether to keep the importer-generated field holding URLs to consider for crawling.
isPreemptive() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Gets whether to perform preemptive authentication (valid for "basic" authentication method).
isQueueInitialized() - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
isRecrawlable(HttpDocInfo) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
isRecrawlable(HttpDocInfo) - Method in interface com.norconex.collector.http.recrawl.IRecrawlableResolver
Whether a document recrawlable or not.
isRedirected(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
 
isScaleStretch() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
isScreenshotEnabled() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets whether to enable taking screenshot of crawled web pages.
isScreenshotScaleStretch() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets whether the screenshot should be stretch to to fill all the scale dimensions.
isStartURLsAsync() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Gets whether the start URLs should be loaded asynchronously.
isStayOnDomain() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
isStayOnPort() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Gets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
isStayOnProtocol() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
IStartURLsProvider - Interface in com.norconex.collector.http.crawler
Provide starting URLs for crawling.
isTimestamped() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Gets whether to add a timestamp to the file name, to ensure a new one is created with each run.
isTrustAllSSLCertificates() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Whether to trust all SSL certificates (affects only "https" connections).
IURLNormalizer - Interface in com.norconex.collector.http.url
Responsible for normalizing URLs.

L

LastModifiedMetadataChecksummer - Class in com.norconex.collector.http.checksum.impl
Default implementation of IMetadataChecksummer for the Norconex HTTP Collector which simply returns the exact value of the "Last-Modified" HTTP header field, or null if not present.
LastModifiedMetadataChecksummer() - Constructor for class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
Link - Class in com.norconex.collector.http.link
Represents a link extracted from a document.
Link(String) - Constructor for class com.norconex.collector.http.link.Link
 
loadChecksummerFromXML(XML) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
loadCollectorConfigFromXML(XML) - Method in class com.norconex.collector.http.HttpCollectorConfig
 
loadCrawlerConfigFromXML(XML) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Loads explicit configuration of delays form XML.
loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
loadFromXML(XML) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
loadFromXML(XML) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
loadFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
loadFromXML(XML) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
loadFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
loadFromXML(XML) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
loadFromXML(XML) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
loadFromXML(XML) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
loadFromXML(XML) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
loadFromXML(XML) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Loads configuration settings specific to the implementing class.
loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
Loads configuration settings specific to the implementing class.
loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 

M

main(String[]) - Static method in class com.norconex.collector.http.HttpCollector
Invokes the HTTP Collector from the command line.
markReferenceVariationsAsProcessed(CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
 
matches(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
METHOD_BASIC - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
BASIC authentication method.
METHOD_DIGEST - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
DIGEST authentication method.
METHOD_FORM - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Form-based authentication method.
METHOD_KERBEROS - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Experimental: Kerberos authentication method.
METHOD_NTLM - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
NTLM authentication method.
METHOD_SPNEGO - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Experimental: SPNEGO authentication method.
MinFrequency() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
MinFrequency(String, String, String) - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 

N

normalizeURL(String) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
normalizeURL(String) - Method in interface com.norconex.collector.http.url.IURLNormalizer
Normalize the given URL.

O

of(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
 
onCrawlerCleanBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
onCrawlerEvent(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
onCrawlerRunBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
onCrawlerStopBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
ORIGINAL_REFERENCE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
OVERLAP_SIZE - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
OVERLAP_SIZE - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 

P

parseRobotsTxt(InputStream, String, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
PhantomJSDocumentFetcher - Class in com.norconex.collector.http.fetch.impl
Deprecated.
Since 3.0.0 use WebDriverHttpFetcher
PhantomJSDocumentFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
PhantomJSDocumentFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
PhantomJSDocumentFetcher.Quality - Enum in com.norconex.collector.http.fetch.impl
Deprecated.
 
PhantomJSDocumentFetcher.Storage - Enum in com.norconex.collector.http.fetch.impl
Deprecated.
 
PhantomJSDocumentFetcher.StorageDiskStructure - Enum in com.norconex.collector.http.fetch.impl
Deprecated.
 
processDocument(HttpFetchClient, Doc) - Method in interface com.norconex.collector.http.processor.IHttpDocumentProcessor
Processes a document.
processDocument(HttpFetchClient, Doc) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in interface com.norconex.collector.http.fetch.util.IRedirectURLProvider
Provides the redirect URL that the crawler must follow.
provideStartURLs() - Method in interface com.norconex.collector.http.crawler.IStartURLsProvider
Provides an iterator over start URLs.

R

REDIRECT - Static variable in class com.norconex.collector.http.doc.HttpCrawlState
 
REDIRECT_TRAIL - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
REFERENCED_URLS - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
REFERENCED_URLS_OUT_OF_SCOPE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
ReferenceDelayResolver - Class in com.norconex.collector.http.delay.impl
Introduces different delays between document downloads based on matching document reference (URL) patterns.
ReferenceDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
ReferenceDelayResolver.DelayReferencePattern - Class in com.norconex.collector.http.delay.impl
 
REFERRER_LINK_PREFIX - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
REFERRER_REFERENCE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
RegexLinkExtractor - Class in com.norconex.collector.http.link.impl
Link extractor using regular expressions to extract links found in text documents.
RegexLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
RegexPair(String, String, boolean) - Constructor for class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
REJECTED_NONCANONICAL - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_REDIRECTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_ROBOTS_META_NOINDEX - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_ROBOTS_TXT - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
REJECTED_TOO_DEEP - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
removeFormParameter(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Remove the authentication form parameter matching the given name.
removeLinkSelector(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
removeLinkTag(String, String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
removeRequestHeader(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Remove the request header matching the given name.
removeRestriction(String) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Removes all restrictions on a given field.
removeRestriction(PropertyMatcher) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Removes a restriction.
Replace(String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
Replace(String, String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
resolve(HttpClient, HttpDocInfo) - Static method in class com.norconex.collector.http.fetch.util.HstsResolver
 
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Resolves explicitly specified delay, in milliseconds.
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
resolveSitemaps(HttpFetchClient, String, List<String>, Consumer<HttpDocInfo>, boolean) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
resolveSitemaps(HttpFetchClient, String, List<String>, Consumer<HttpDocInfo>, boolean) - Method in interface com.norconex.collector.http.sitemap.ISitemapResolver
Resolves the sitemap instructions for a URL "root" (e.g.
RobotsMeta - Class in com.norconex.collector.http.robot
 
RobotsMeta(boolean, boolean) - Constructor for class com.norconex.collector.http.robot.RobotsMeta
 
RobotsTxt - Class in com.norconex.collector.http.robot
 
RobotsTxt(IRobotsTxtFilter...) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
Creates a new robot txt object with the supplied filters.
RobotsTxt(List<IRobotsTxtFilter>) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
Creates a new robot txt object with the supplied filters.
RobotsTxt(List<IRobotsTxtFilter>, float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 
RobotsTxt(List<IRobotsTxtFilter>, List<String>) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 
RobotsTxt(List<IRobotsTxtFilter>, List<String>, float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
 

S

saveChecksummerToXML(XML) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
saveCollectorConfigToXML(XML) - Method in class com.norconex.collector.http.HttpCollectorConfig
 
saveCrawlerConfigToXML(XML) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Saves explicit configuration of delays to XML.
saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Saves configuration settings specific to the implementing class.
saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
Saves configuration settings specific to the implementing class.
saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 
saveToXML(XML) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
saveToXML(XML) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
saveToXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
saveToXML(XML) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
saveToXML(XML) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
saveToXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
saveToXML(XML) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
saveToXML(XML) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
saveToXML(XML) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
saveToXML(XML) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
saveToXML(XML) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
ScaledImage - Class in com.norconex.collector.http.processor.impl
 
ScaledImage(String, Dimension, BufferedImage) - Constructor for class com.norconex.collector.http.processor.impl.ScaledImage
 
SCOPE_CRAWLER - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
SCOPE_SITE - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
SCOPE_THREAD - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
ScreenshotHandler - Class in com.norconex.collector.http.fetch.impl.webdriver
Takes screenshot of pages using a Selenium WebDriver.
ScreenshotHandler() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
ScreenshotHandler(CachedStreamFactory) - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
SegmentCountURLFilter - Class in com.norconex.collector.http.filter.impl
Filters URL based based on the number of URL segments.
SegmentCountURLFilter() - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int, OnMatch) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
SegmentCountURLFilter(int, OnMatch, boolean) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
Constructor.
setApplyTo(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setAuthConfig(HttpAuthConfig) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
setBrowser(Browser) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setBrowserPath(Path) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setCanonicalLinkDetector(ICanonicalLinkDetector) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the canonical link detector.
setCaseSensitive(boolean) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setCharset(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Sets the assumed source character encoding.
setCharset(String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the character set of pages on which link extraction is performed.
setCharset(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Sets the character set of pages on which link extraction is performed.
setCombined(boolean) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
setCommentsEnabled(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets whether links should be extracted from HTML/XML comments.
setConnectionCharset(Charset) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the connection character set.
setConnectionRequestTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the timeout when requesting a connection, in milliseconds.
setConnectionTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the connection timeout until a connection is established, in milliseconds.
setContentTypePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setContentTypes(ContentType...) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
Sets the content types on which to perform canonical link detection.
setContentTypes(List<ContentType>) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
Sets the content types on which to perform canonical link detection.
setCookieSpec(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
setCount(int) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setCrawlerIds(List<String>) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
setCrawlState(CrawlState) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setCredentials(Credentials) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
setCssSelector(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
setDefaultDelay(long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets the default delay in milliseconds.
setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern>) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
setDelayResolver(IDelayResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setDepth(int) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the URL depth.
setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setDisabled(boolean) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
Deprecated.
Since 2.0.0, not having a checksummer defined or setting one explicitly to null effectively disable it.
setDisabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
Sets whether this URL Normalizer is disabled or not.
setDisableETag(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether whether adding "ETag" If-None-Match HTTP request header is disabled.
setDisableHSTS(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain Strict-Transport-Security policy (obtained from HTTP response header).
setDisableIfModifiedSince(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether adding the If-Modified-Since HTTP request header is disabled.
setDisableSNI(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether Server Name Indication (SNI) is disabled.
setDomain(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the NTLM authentication domain
setDomSelector(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setDriverPath(Path) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setDuplicate(boolean) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setEarlyPageScript(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setEtag(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the HTTP ETag.
setException(Exception) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setExePath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setExpectContinueEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether 'Expect: 100-continue' handshake is enabled.
setExtractBetweens(HtmlLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the patterns delimiting the portions of a document to be considered for link extraction.
setExtractBetweens(List<HtmlLinkExtractor.RegexPair>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the patterns delimiting the portions of a document to be considered for link extraction.
setExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
setExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
setExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the selectors matching the portions of a document to be considered for link extraction.
setExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the selectors matching the portions of a document to be considered for link extraction.
setFallbackCharset(String) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
setFetchHttpGet(HttpCrawlerConfig.HttpMethodSupport) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to fetch HTTP documents using an HTTP GET request.
setFetchHttpHead(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
setFetchHttpHead(HttpCrawlerConfig.HttpMethodSupport) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to fetch HTTP response headers using an HTTP HEAD request.
setFieldMatcher(TextMatcher) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
Gets field matcher identifying fields holding content used for link extraction.
setFileNamePrefix(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets the generated report file name prefix.
setForceCharsetDetection(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether character encoding is detected instead of relying on HTTP response header.
setForceContentTypeDetection(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether content type is detected instead of relying on HTTP response header.
setFormCharset(Charset) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the authentication form character set for the form field values.
setFormParam(String, String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
setFormParams(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets authentication form parameters (equivalent to "input" or other fields in HTML forms).
setFormPasswordField(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the name of the HTML field where the password is set.
setFormSelector(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the CSS selelector that identifies the form in a login page.
setFormUsernameField(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the name of the HTML field where the username is set.
setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setHeadersPrefix(String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
setHost(Host) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the host for the current authentication scope.
setHttpFetchers(IHttpFetcher...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets HTTP fetchers.
setHttpFetchers(List<IHttpFetcher>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets HTTP fetchers.
setHttpFetchersMaxRetries(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the maximum number of times an HTTP fetcher will re-attempt fetching a resource in case of failures.
setHttpFetchersRetryDelay(long) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets how long to wait before a failing HTTP fetcher re-attempts fetching a resource in case of failures (in milliseconds).
setHttpMethods(List<HttpMethod>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the list of HTTP methods to be accepted by this fetcher.
setHttpSnifferConfig(HttpSnifferConfig) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setIgnoreCanonicalLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Sets whether to ignore extra data associated with a link.
setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets whether to ignore extra data associated with a link.
setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
Sets whether to ignore extra data associated with a link.
setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
setIgnoreRobotsCrawlDelay(boolean) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets whether to ignore crawl delays specified in a site robots.txt file.
setIgnoreRobotsMeta(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setIgnoreRobotsTxt(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setIgnoreSitemap(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to ignore sitemap detection and resolving for URLs processed.
setImage(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ImageCache
 
setImageCacheDir(Path) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setImageCacheSize(int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setImageFormat(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setImageFormat(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setImplicitlyWait(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setIncludeSubdomains(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether sub-domains are considered to be the same as a URL domain.
setKeepDownloads(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setKeepOutOfScopeLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Deprecated.
setKeepReferencedLinks(Set<HttpCrawlerConfig.ReferencedLinkType>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to keep referenced links and what to keep.
setKeepReferencedLinks(HttpCrawlerConfig.ReferencedLinkType...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to keep referenced links and what to keep.
setLargest(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setLatePageScript(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
setLinkExtractors(ILinkExtractor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets link extractors.
setLinkExtractors(List<ILinkExtractor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets link extractors.
setLocalAddress(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the local address, which may be useful when working with multiple network interfaces.
setMaxBufferSize(int) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
setMaxConnectionIdleTime(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the period of time in milliseconds after which to evict idle connections from the connection pool.
setMaxConnectionInactiveTime(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
setMaxConnections(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets maximum number of connections that can be created.
setMaxConnectionsPerRoute(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the maximum number of connections that can be used per route.
setMaxDepth(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setMaxRedirects(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the maximum number of redirects to be followed.
setMaxURLLength(int) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the maximum supported URL length.
setMaxURLLength(int) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
Sets the maximum supported URL length.
setMetadata(Properties) - Method in class com.norconex.collector.http.link.Link
 
setMethod(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the authentication method.
setMinDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setMinDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setMinFrequencies(GenericRecrawlableResolver.MinFrequency...) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Sets minimum frequencies.
setMinFrequencies(Collection<GenericRecrawlableResolver.MinFrequency>) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Sets minimum frequencies.
setNoExtractBetweens(HtmlLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the patterns delimiting the portions of a document to be excluded from link extraction.
setNoExtractBetweens(List<HtmlLinkExtractor.RegexPair>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the patterns delimiting the portions of a document to be excluded from link extraction.
setNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the selectors matching the portions of a document to be excluded from link extraction.
setNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the selectors matching the portions of a document to be excluded from link extraction.
setNormalizations(GenericURLNormalizer.Normalization...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setNormalizations(List<GenericURLNormalizer.Normalization>) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets HTTP status codes to be considered as "Not found" state.
setNotFoundStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets HTTP status codes to be considered as "Not found" state.
setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets HTTP status codes to be considered as "Not found" state.
setNotFoundStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets HTTP status codes to be considered as "Not found" state.
setOnMatch(OnMatch) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setOptions(List<String>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets optional extra PhantomJS command-line options.
setOptions(String...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets optional extra PhantomJS command-line options.
setOriginalReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
setOutputDir(Path) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets the local directory where this listener report will be written.
setPageContentTypePattern(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setPageLoadTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setParser(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Sets the parser to use when creating the DOM-tree.
setPattern(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setPort(int) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
setPostImportLinks(TextMatcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Set a field matcher used to identify post-import metadata fields holding URLs to consider for crawling.
setPostImportLinksKeep(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether to keep the importer-generated field holding URLs to consider for crawling.
setPostImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets post-import processors.
setPostImportProcessors(List<IHttpDocumentProcessor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets post-import processors.
setPreemptive(boolean) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets whether to perform preemptive authentication (valid for "basic" authentication method).
setPreImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets pre-import processors.
setPreImportProcessors(List<IHttpDocumentProcessor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets pre-import processors.
setProxySettings(ProxySettings) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
setRealm(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the realm name for the current authentication scope.
setReasonPhrase(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setRecrawlableResolver(IRecrawlableResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the recrawlable resolver.
setRedirectTarget(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setRedirectTrail(List<String>) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the trail of URLs that were redirected up to this one.
setRedirectURLProvider(IRedirectURLProvider) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the redirect URL provider
setReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
setReferencedUrls(List<String>) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets URLs referenced by this one.
setReferenceFilters(IReferenceFilter...) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Sets reference filters.
setReferenceFilters(List<IReferenceFilter>) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
Sets reference filters.
setReferencePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setReferrer(String) - Method in class com.norconex.collector.http.link.Link
 
setReferrerLinkMetadata(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
setReferrerReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
setRemoteURL(URL) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setRenderWaitTime(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setReplaces(GenericURLNormalizer.Replace...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setReplaces(List<GenericURLNormalizer.Replace>) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
setRequestHeader(String, String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets a default HTTP request header every HTTP connection should have.
setRequestHeaders(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets a default HTTP request headers every HTTP connection should have.
setRequestHeaders(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
setRequestIfModifiedSince(HttpRequest, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Sets the If-Modified-Since HTTP request header based on document cached last crawled date (if any).
setRequestIfNoneMatch(HttpRequest, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
Sets the ETag If-None-Match HTTP request header based on document cached ETag value (if any).
setResourceTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
setRestrictions(List<PropertyMatcher>) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
Sets restrictions this extractor should be restricted to.
setRobotsMeta(RobotsMeta) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
 
setRobotsMetaProvider(IRobotsMetaProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setRobotsTxtProvider(IRobotsTxtProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setScaleDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleQuality(FeaturedImageProcessor.Quality) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setScaleStretch(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setSchedules(List<GenericDelayResolver.DelaySchedule>) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
setSchemes(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Sets the schemes to be extracted.
setSchemes(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
Sets the schemes to be extracted.
setSchemes(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the schemes to be extracted.
setSchemes(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
Sets the schemes to be extracted.
setScope(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
Sets the delay scope.
setScreenshotDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setScreenshotDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setScreenshotEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets whether to enable taking screenshot of crawled web pages.
setScreenshotHandler(ScreenshotHandler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
setScreenshotImageFormat(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the screenshot image format (jpg, png, gif, bmp, etc.).
setScreenshotScaleDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the pixel dimensions we want the stored screenshot to have.
setScreenshotScaleDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the pixel dimensions we want the stored screenshot to have.
setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the screenshot scaling quality to use when when storage is "disk" or "inline".
setScreenshotScaleStretch(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets whether the screenshot should be stretch to to fill all the scale dimensions.
setScreenshotStorage(List<PhantomJSDocumentFetcher.Storage>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the screenshot storage mechanisms.
setScreenshotStorage(PhantomJSDocumentFetcher.Storage...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the screenshot storage mechanisms.
setScreenshotStorageDiskDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the directory where screenshots are saved when storage is "disk".
setScreenshotStorageDiskField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the screenshot directory structure to create when storage is "disk".
setScreenshotStorageInlineField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
setScreenshotZoomFactor(float) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setScriptPath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
setScriptTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setSeparator(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the sitemap change frequency.
setSitemapLastMod(ZonedDateTime) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the sitemap last modified date.
setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
setSitemapPaths(List<String>) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
setSitemapPriority(Float) - Method in class com.norconex.collector.http.doc.HttpDocInfo
Sets the sitemap priority.
setSitemapResolver(ISitemapResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setSitemapSupport(GenericRecrawlableResolver.SitemapSupport) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
Sets the sitemap support strategy.
setSocketTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.
setSSLProtocols(List<String>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.
setStartSitemapURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the sitemap URLs used as starting points for crawling.
setStartSitemapURLs(List<String>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the sitemap URLs used as starting points for crawling.
setStartURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets URLs to initiate crawling from.
setStartURLs(List<String>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets URLs to initiate crawling from.
setStartURLsAsync(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets whether the start URLs should be loaded asynchronously.
setStartURLsFiles(Path...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the file paths of seed files containing URLs to be used as "start URLs".
setStartURLsFiles(List<Path>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the file paths of seed files containing URLs to be used as "start URLs".
setStartURLsProviders(IStartURLsProvider...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the providers of URLs used as starting points for crawling.
setStartURLsProviders(List<IStartURLsProvider>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the providers of URLs used as starting points for crawling.
setStatusCode(int) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setStatusCodes(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets a coma-separated list of status codes to listen to.
setStayOnDomain(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
setStayOnPort(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
setStayOnProtocol(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
Sets whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
setStorage(FeaturedImageProcessor.Storage...) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
Sets the storage mechanisms.
setStorage(List<FeaturedImageProcessor.Storage>) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
Sets the storage mechanisms.
setStorageDiskDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageDiskField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageDiskStructure(FeaturedImageProcessor.StorageDiskStructure) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageInlineField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setStorageUrlField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
setTargetDir(Path) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTargetDirField(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTargetDirStructure(DocImageHandler.DirStructure) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTargetMetaField(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTargets(DocImageHandler.Target...) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTargets(List<DocImageHandler.Target>) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
setTempDir(Path) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
Sets the directory where temporary sitemap files are written.
setThreadWait(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setTimestamped(boolean) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
Sets whether to add a timestamp to the file name, to ensure a new one is created with each run.
setTrustAllSSLCertificates(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Sets whether to trust all SSL certificate.
setUrl(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the URL for "form" authentication.
setUrlCrawlScopeStrategy(URLCrawlScopeStrategy) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
Sets the strategy to use to determine if a URL is in scope.
setUrlNormalizer(IURLNormalizer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
setUserAgent(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
setUserAgent(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
setUserAgent(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
setValidExitCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets valid PhantomJS exit values (defaults to 0).
setValidExitCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Sets valid PhantomJS exit values (defaults to 0).
setValidStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets valid HTTP response status codes.
setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
Gets valid HTTP response status codes.
setValidStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets valid HTTP response status codes.
setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
Gets valid HTTP response status codes.
setValue(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
setWaitForElementSelector(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setWaitForElementTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setWaitForElementType(WebDriverHttpFetcherConfig.WaitElementType) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setWindowSize(Dimension) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
setWorkstation(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
Sets the NTLM authentication workstation name.
SiteDelay - Class in com.norconex.collector.http.delay.impl
 
SiteDelay() - Constructor for class com.norconex.collector.http.delay.impl.SiteDelay
 
SitemapChangeFrequency - Enum in com.norconex.collector.http.sitemap
Sitemap change frequency unit, as defined on http://www.sitemaps.org/protocol.html
SM_CHANGE_FREQ - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
SM_LASTMOD - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
SM_PRORITY - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
 
StandardRobotsMetaProvider - Class in com.norconex.collector.http.robot.impl
Implementation of IRobotsMetaProvider as per X-Robots-Tag and ROBOTS standards.
StandardRobotsMetaProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
StandardRobotsTxtProvider - Class in com.norconex.collector.http.robot.impl
Implementation of IRobotsTxtProvider as per the robots.txt standard described at http://www.robotstxt.org/robotstxt.html.
StandardRobotsTxtProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 

T

takeScreenshot(WebDriver, Doc) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
TARGET_REDIRECT_CONTEXT_KEY - Static variable in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
 
ThreadDelay - Class in com.norconex.collector.http.delay.impl
 
ThreadDelay() - Constructor for class com.norconex.collector.http.delay.impl.ThreadDelay
 
TikaLinkExtractor - Class in com.norconex.collector.http.link.impl
Implementation of ILinkExtractor using Apache Tika to perform URL extractions from HTML documents.
TikaLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
TINY_SLEEP_MS - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelay
 
toHTMLInlineString(String) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
 
TOO_DEEP - Static variable in class com.norconex.collector.http.doc.HttpCrawlState
 
toString() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
 
toString() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
 
toString() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 
toString() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
 
toString() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
toString() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
 
toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
 
toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
 
toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
 
toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
 
toString() - Method in class com.norconex.collector.http.doc.HttpDocInfo
 
toString() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
 
toString() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
 
toString() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
toString() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
 
toString() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
 
toString() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
 
toString() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
Deprecated.
 
toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
 
toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
 
toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
toString() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
 
toString() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
 
toString() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
 
toString() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
 
toString() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 
toString() - Method in class com.norconex.collector.http.link.Link
 
toString() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
 
toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
 
toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
 
toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
 
toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
 
toString() - Method in class com.norconex.collector.http.robot.RobotsMeta
 
toString() - Method in class com.norconex.collector.http.robot.RobotsTxt
 
toString() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
 
toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
 
TrustAllX509TrustManager - Class in com.norconex.collector.http.fetch.util
A very unsafe trust manager accepting ALL certificates.
TrustAllX509TrustManager() - Constructor for class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
 

U

UNSPECIFIED_CRAWL_DELAY - Static variable in class com.norconex.collector.http.robot.RobotsTxt
 
unsupported() - Static method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
 
URLCrawlScopeStrategy - Class in com.norconex.collector.http.crawler
By default a crawler will try to follow all links it discovers.
URLCrawlScopeStrategy() - Constructor for class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
 
URLS_EXTRACTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
URLS_POST_IMPORTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
 
URLStatusCrawlerEventListener - Class in com.norconex.collector.http.crawler.event.impl
Store on file all URLs that were "fetched", along with their HTTP response code.
URLStatusCrawlerEventListener() - Constructor for class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
 

V

valueOf(String) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.HttpMethod
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
Deprecated.
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
Deprecated.
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
Deprecated.
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.Target
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.HttpMethod
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
Deprecated.
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
Deprecated.
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
Deprecated.
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.Target
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
Returns an array containing the constants of this enum type, in the order they are declared.

W

WebDriverHttpFetcher - Class in com.norconex.collector.http.fetch.impl.webdriver
Uses Selenium WebDriver support for using native browsers to crawl documents.
WebDriverHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
WebDriverHttpFetcher(WebDriverHttpFetcherConfig) - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
 
WebDriverHttpFetcherConfig - Class in com.norconex.collector.http.fetch.impl.webdriver
Configuration for WebDriverHttpFetcher.
WebDriverHttpFetcherConfig() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
 
WebDriverHttpFetcherConfig.WaitElementType - Enum in com.norconex.collector.http.fetch.impl.webdriver
 

X

XMLFeedLinkExtractor - Class in com.norconex.collector.http.link.impl
Link extractor for extracting links out of RSS and Atom XML feeds.
XMLFeedLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
 
A B C D E F G H I L M N O P R S T U V W X 
Skip navigation links
Submit an Issue   |   norconex.com

Copyright © 2009–2023 Norconex Inc.. All rights reserved.