A B C D E F G H I L M N O P R S T U V W X Y
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractDelay - Class in com.norconex.collector.http.delay.impl
-
Convenience class to encapsulate various delay strategies.
- AbstractDelay() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelay
- AbstractDelayResolver - Class in com.norconex.collector.http.delay.impl
-
Base implementation for creating voluntary delays between URL downloads.
- AbstractDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- AbstractHttpFetcher - Class in com.norconex.collector.http.fetch
-
Base class implementing the
AbstractHttpFetcher.accept(Doc, HttpMethod)
method using reference filters to determine if this fetcher will accept to fetch a URL and delegating the HTTP method check to its ownAbstractHttpFetcher.accept(HttpMethod)
abstract method. - AbstractHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.AbstractHttpFetcher
- AbstractLinkExtractor - Class in com.norconex.collector.http.link
-
Base class for link extraction providing common configuration settings.
- AbstractLinkExtractor() - Constructor for class com.norconex.collector.http.link.AbstractLinkExtractor
- AbstractTextLinkExtractor - Class in com.norconex.collector.http.link
-
Base class for link extraction from text documents, providing common configuration settings such as being able to apply extraction to specific documents only, and being able to specify one or more metadata fields from which to grab the text for extracting links.
- AbstractTextLinkExtractor() - Constructor for class com.norconex.collector.http.link.AbstractTextLinkExtractor
- accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Whether the supplied HttpMethod is supported by this fetcher.
- accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- accept(HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- accept(Event) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- accept(Event) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- accept(Doc, HttpMethod) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- accept(Doc, HttpMethod) - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
- accept(Doc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- acceptDocument(Doc) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- acceptMetadata(String, Properties) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- acceptReference(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- addDirectoryTrailingSlash - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- addDomainTrailingSlash - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- addExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds patterns delimiting a portion of a document to be considered for link extraction.
- addExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- addExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds selectors matching the portions of a document to be considered for link extraction.
- addExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- addExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds selectors matching the portions of a document to be considered for link extraction.
- addLinkSelector(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Adds a new link selector extracting the "text" from matches.
- addLinkSelector(String, String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- addLinkTag(String, String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- addNoExtractBetween(String, String, boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds patterns delimiting a portion of a document to be excluded from link extraction.
- addNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- addNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds selectors matching the portions of a document to be excluded from link extraction.
- addNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- addNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Adds selectors matching the portions of a document to be excluded from link extraction.
- addPattern(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- addPattern(String, String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Adds a URL pattern, with an optional replacement.
- addRedirectURL(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Adds a redirect URL to the trail of URLs that were redirected so far.
- addResponse(IHttpFetchResponse, IHttpFetcher) - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- addRestriction(PropertyMatcher...) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Adds one or more restrictions this extractor should be restricted to.
- addRestrictions(List<PropertyMatcher>) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Adds restrictions this extractor should be restricted to.
- addWWW - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- afterCrawlerExecution() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- ALWAYS - com.norconex.collector.http.sitemap.SitemapChangeFrequency
- ApacheHttpUtil - Class in com.norconex.collector.http.fetch.util
-
Utility methods for fetcher implementations using Apache HttpClient.
- ApacheRedirectCaptureStrategy - Class in com.norconex.collector.http.fetch.util
-
This class is used by each crawler instance to capture the closest redirect target whether it is part of a redirect chain or not.
- ApacheRedirectCaptureStrategy(IRedirectURLProvider) - Constructor for class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
- applyContentTypeAndCharset(String, CrawlDocInfo) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Applies the
Content-Type
HTTP response header on the supplied document info. - applyResponseContent(HttpResponse, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Applies the HTTP response content to a document if such content exists.
- applyResponseHeaders(HttpResponse, String, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Applies the HTTP response headers to a document.
- AUTH_METHOD_BASIC - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
BASIC authentication method.
- AUTH_METHOD_DIGEST - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
DIGEST authentication method.
- AUTH_METHOD_FORM - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
Form-based authentication method.
- AUTH_METHOD_KERBEROS - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
Experimental: Kerberos authentication method.
- AUTH_METHOD_NTLM - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
NTLM authentication method.
- AUTH_METHOD_SPNEGO - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
Experimental: SPNEGO authentication method.
- authenticateUsingForm(HttpClient) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- authenticateUsingForm(HttpClient, HttpAuthConfig) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
- AUTO - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.
- AUTO - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
B
- beforeCrawlerExecution(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- beforeFinalizeDocumentProcessing(CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- Browser - Enum in com.norconex.collector.http.fetch.impl.webdriver
-
A web browser.
- Browser.CustomDriverOptions - Class in com.norconex.collector.http.fetch.impl.webdriver
- Browser.WebDriverBuilder - Class in com.norconex.collector.http.fetch.impl.webdriver
- buildCustomHttpClient(HttpClientBuilder) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
For implementors to subclass.
C
- checkClientTrusted(X509Certificate[], String) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- checkClientTrusted(X509Certificate[], String, Socket) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- checkClientTrusted(X509Certificate[], String, SSLEngine) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- checkServerTrusted(X509Certificate[], String) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- checkServerTrusted(X509Certificate[], String, Socket) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- checkServerTrusted(X509Certificate[], String, SSLEngine) - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- CHROME - com.norconex.collector.http.fetch.impl.webdriver.Browser
- CLASSNAME - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- clearLinkSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- clearLinkTags() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- clearPatterns() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- clearRestrictions() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Clears all restrictions.
- COLLECTOR_FEATURED_IMAGE_INLINE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- COLLECTOR_FEATURED_IMAGE_PATH - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- COLLECTOR_FEATURED_IMAGE_URL - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- COLLECTOR_PHANTOMJS_SCREENSHOT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- com.norconex.collector.http - package com.norconex.collector.http
- com.norconex.collector.http.canon - package com.norconex.collector.http.canon
- com.norconex.collector.http.canon.impl - package com.norconex.collector.http.canon.impl
- com.norconex.collector.http.checksum.impl - package com.norconex.collector.http.checksum.impl
- com.norconex.collector.http.crawler - package com.norconex.collector.http.crawler
- com.norconex.collector.http.crawler.event.impl - package com.norconex.collector.http.crawler.event.impl
- com.norconex.collector.http.delay - package com.norconex.collector.http.delay
- com.norconex.collector.http.delay.impl - package com.norconex.collector.http.delay.impl
- com.norconex.collector.http.doc - package com.norconex.collector.http.doc
- com.norconex.collector.http.fetch - package com.norconex.collector.http.fetch
- com.norconex.collector.http.fetch.impl - package com.norconex.collector.http.fetch.impl
- com.norconex.collector.http.fetch.impl.webdriver - package com.norconex.collector.http.fetch.impl.webdriver
- com.norconex.collector.http.fetch.util - package com.norconex.collector.http.fetch.util
- com.norconex.collector.http.filter.impl - package com.norconex.collector.http.filter.impl
- com.norconex.collector.http.link - package com.norconex.collector.http.link
- com.norconex.collector.http.link.impl - package com.norconex.collector.http.link.impl
- com.norconex.collector.http.pipeline.committer - package com.norconex.collector.http.pipeline.committer
- com.norconex.collector.http.pipeline.importer - package com.norconex.collector.http.pipeline.importer
- com.norconex.collector.http.pipeline.queue - package com.norconex.collector.http.pipeline.queue
- com.norconex.collector.http.processor - package com.norconex.collector.http.processor
- com.norconex.collector.http.processor.impl - package com.norconex.collector.http.processor.impl
- com.norconex.collector.http.recrawl - package com.norconex.collector.http.recrawl
- com.norconex.collector.http.recrawl.impl - package com.norconex.collector.http.recrawl.impl
- com.norconex.collector.http.robot - package com.norconex.collector.http.robot
- com.norconex.collector.http.robot.impl - package com.norconex.collector.http.robot.impl
- com.norconex.collector.http.sitemap - package com.norconex.collector.http.sitemap
- com.norconex.collector.http.sitemap.impl - package com.norconex.collector.http.sitemap.impl
- com.norconex.collector.http.url - package com.norconex.collector.http.url
- com.norconex.collector.http.url.impl - package com.norconex.collector.http.url.impl
- compareTo(Link) - Method in class com.norconex.collector.http.link.Link
- contains(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- contains(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- contains(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- CrawlerDelay - Class in com.norconex.collector.http.delay.impl
-
It is assumed there will be one instance of this class per crawler defined.
- CrawlerDelay() - Constructor for class com.norconex.collector.http.delay.impl.CrawlerDelay
- create() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- createChildDocInfo(String, CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- createConnectionConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createCrawler(CrawlerConfig) - Method in class com.norconex.collector.http.HttpCollector
- createCredentialsProvider() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- CREATED_ROBOTS_META - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- createDefaultCookieStore() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
Creates the default cookie store to be added to each request context.
- createDefaultRequestHeaders() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
-
Creates a list of HTTP headers based on configuration.
- createDriver(WebDriverLocation, MutableCapabilities) - Method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
- createHttpClient() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createOptions(WebDriverLocation) - Method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
- createProxy() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createRequestConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createSchemePortResolver() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createSSLContext() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createSSLSocketFactory(SSLContext) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- createUriRequest(String, HttpMethod) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Creates an HTTP request.
- createUriRequest(String, String) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Creates an HTTP request.
- CSSSELECTOR - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- CUSTOM - com.norconex.collector.http.fetch.impl.webdriver.Browser
- CustomDriverOptions() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.Browser.CustomDriverOptions
D
- DAILY - com.norconex.collector.http.sitemap.SitemapChangeFrequency
- DATE - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Deprecated.
- DATE - com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
- DATE - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
- DATETIME - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Deprecated.
- DATETIME - com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
- DATETIME - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
- decodeUnreservedCharacters - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- DEFAULT_DELAY - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Default delay is 3 seconds.
- DEFAULT_FALLBACK_CHARSET - Static variable in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- DEFAULT_FILENAME_PREFIX - Static variable in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- DEFAULT_IMAGE_CACHE_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_IMAGE_CACHE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.fetch.util.DocImageHandler
- DEFAULT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- DEFAULT_MAX_CONNECTIONS - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_MAX_CONNECTIONS_PER_ROUTE - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_MAX_IDLE_TIME - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_MAX_REDIRECT - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Default maximum length a URL can have.
- DEFAULT_MAX_URL_LENGTH - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Default maximum length a URL can have.
- DEFAULT_MIN_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_NOT_FOUND_STATUS_CODES - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_PAGE_CONTENT_TYPE_PATTERN - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_RENDER_WAIT_TIME - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCALE_SIZE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_SCREENSHOT_DIR - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- DEFAULT_SCREENSHOT_DIR_FIELD - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- DEFAULT_SCREENSHOT_IMAGE_FORMAT - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCREENSHOT_META_FIELD - Static variable in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- DEFAULT_SCREENSHOT_SCALE_SIZE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCREENSHOT_STORAGE - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCREENSHOT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCREENSHOT_ZOOM_FACTOR - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SCRIPT_PATH - Static variable in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- DEFAULT_SEGMENT_COUNT - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Default segment count.
- DEFAULT_SEGMENT_SEPARATOR_PATTERN - Static variable in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Default segment separator pattern.
- DEFAULT_SITEMAP_PATHS - Static variable in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- DEFAULT_STORAGE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_STORAGE_DISK_DIR - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_STORAGE_DISK_STRUCTURE - Static variable in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- DEFAULT_TIMEOUT - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- DEFAULT_TYPES - Static variable in class com.norconex.collector.http.fetch.util.DocImageHandler
- DEFAULT_VALID_STATUS_CODES - Static variable in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- delay(long, long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
- delay(long, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelay
- delay(long, String) - Method in class com.norconex.collector.http.delay.impl.CrawlerDelay
- delay(long, String) - Method in class com.norconex.collector.http.delay.impl.SiteDelay
- delay(long, String) - Method in class com.norconex.collector.http.delay.impl.ThreadDelay
- delay(RobotsTxt, String) - Method in interface com.norconex.collector.http.delay.IDelayResolver
-
Delay crawling activities (if applicable).
- delay(RobotsTxt, String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- DelayReferencePattern(String, long) - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- DelaySchedule(String, String, String, long) - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- DEPTH - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- detectFromContent(String, InputStream, ContentType) - Method in interface com.norconex.collector.http.canon.ICanonicalLinkDetector
-
Detects from a document content the presence of a canonical URL.
- detectFromContent(String, InputStream, ContentType) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- detectFromMetadata(String, Properties) - Method in interface com.norconex.collector.http.canon.ICanonicalLinkDetector
-
Detects from metadata gathered so far, which when invoked, is normally the HTTP header values.
- detectFromMetadata(String, Properties) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- DIRECTORY - com.norconex.collector.http.fetch.util.DocImageHandler.Target
- DISABLED - com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
- DISK - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Deprecated.
- DISK - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
- DocImageHandler - Class in com.norconex.collector.http.fetch.util
-
Handles images associated with a document (which is different than a document being itself an image).
- DocImageHandler() - Constructor for class com.norconex.collector.http.fetch.util.DocImageHandler
- DocImageHandler(Path, String, String) - Constructor for class com.norconex.collector.http.fetch.util.DocImageHandler
- DocImageHandler.DirStructure - Enum in com.norconex.collector.http.fetch.util
- DocImageHandler.Target - Enum in com.norconex.collector.http.fetch.util
- doCreateMetaChecksum(Properties) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- DOMLinkExtractor - Class in com.norconex.collector.http.link.impl
-
Extracts links from a Document Object Model (DOM) representation of an HTML, XHTML, or XML document content based on values of matching elements and attributes.
- DOMLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.DOMLinkExtractor
E
- EDGE - com.norconex.collector.http.fetch.impl.webdriver.Browser
- encodeNonURICharacters - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- encodeSpaces - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- equals(Object) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- equals(Object) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- equals(Object) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- equals(Object) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- equals(Object) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
- equals(Object) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- equals(Object) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- equals(Object) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- equals(Object) - Method in class com.norconex.collector.http.doc.HttpDocInfo
- equals(Object) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- equals(Object) - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- equals(Object) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- equals(Object) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- equals(Object) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- equals(Object) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- equals(Object) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- equals(Object) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- equals(Object) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- equals(Object) - Method in class com.norconex.collector.http.link.Link
- equals(Object) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- equals(Object) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- equals(Object) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- equals(Object) - Method in class com.norconex.collector.http.robot.RobotsMeta
- equals(Object) - Method in class com.norconex.collector.http.robot.RobotsTxt
- equals(Object) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- equals(Object) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- executeCommitterPipeline(Crawler, CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- executeImporterPipeline(ImporterPipelineContext) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- executeQueuePipeline(CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- extractLinks(CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- extractLinks(CrawlDoc) - Method in interface com.norconex.collector.http.link.ILinkExtractor
- extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- extractLinks(Set<Link>, CrawlDoc) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- extractTextLinks(Set<Link>, HandlerDoc, Reader) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
F
- FeaturedImageProcessor - Class in com.norconex.collector.http.processor.impl
-
Document processor that extract the "main" image from HTML pages.
- FeaturedImageProcessor() - Constructor for class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- FeaturedImageProcessor.Quality - Enum in com.norconex.collector.http.processor.impl
- FeaturedImageProcessor.Storage - Enum in com.norconex.collector.http.processor.impl
- FeaturedImageProcessor.StorageDiskStructure - Enum in com.norconex.collector.http.processor.impl
- fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.HttpFetchClient
- fetch(CrawlDoc, HttpMethod) - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
-
Performs an HTTP request for the supplied document reference and HTTP method.
- fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- fetch(CrawlDoc, HttpMethod) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- fetchDocumentContent(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Invoked once per fetcher when the collector ends.
- fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- fetcherShutdown(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Invoked once per fetcher instance, when the collector starts.
- fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- fetcherStartup(HttpCollector) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- fetcherThreadBegin(HttpCrawler) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Invoked each time a crawler begins a new crawler thread if that thread is the current thread.
- fetcherThreadBegin(HttpCrawler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- fetcherThreadEnd(HttpCrawler) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Invoked each time a crawler ends an existing crawler thread if that thread is the current thread.
- fetcherThreadEnd(HttpCrawler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- FIREFOX - com.norconex.collector.http.fetch.impl.webdriver.Browser
- FIRST - com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
- fits(int, int) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- fits(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- fits(Dimension) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- fri - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
G
- GenericCanonicalLinkDetector - Class in com.norconex.collector.http.canon.impl
-
Generic canonical link detector.
- GenericCanonicalLinkDetector() - Constructor for class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- GenericDelayResolver - Class in com.norconex.collector.http.delay.impl
-
Default implementation for creating voluntary delays between URL downloads.
- GenericDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver
- GenericDelayResolver.DelaySchedule - Class in com.norconex.collector.http.delay.impl
- GenericDelayResolver.DelaySchedule.DOW - Enum in com.norconex.collector.http.delay.impl
- GenericHttpFetcher - Class in com.norconex.collector.http.fetch.impl
-
Default implementation of
IHttpFetcher
, based on Apache HttpClient. - GenericHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- GenericHttpFetcher(GenericHttpFetcherConfig) - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- GenericHttpFetcherConfig - Class in com.norconex.collector.http.fetch.impl
-
Generic HTTP Fetcher configuration.
- GenericHttpFetcherConfig() - Constructor for class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- GenericLinkExtractor - Class in com.norconex.collector.http.link.impl
-
Deprecated.Since 3.0.0, use
HtmlLinkExtractor
orDOMLinkExtractor
instead. - GenericLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.GenericLinkExtractor
-
Deprecated.
- GenericRecrawlableResolver - Class in com.norconex.collector.http.recrawl.impl
-
Relies on both sitemap directives and custom instructions for establishing the minimum frequency between each document recrawl.
- GenericRecrawlableResolver() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- GenericRecrawlableResolver.MinFrequency - Class in com.norconex.collector.http.recrawl.impl
- GenericRecrawlableResolver.SitemapSupport - Enum in com.norconex.collector.http.recrawl.impl
- GenericRedirectURLProvider - Class in com.norconex.collector.http.fetch.util
-
Provide redirect URLs by grabbing them from the HTTP Response
Location
header value. - GenericRedirectURLProvider() - Constructor for class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- GenericSitemapResolver - Class in com.norconex.collector.http.sitemap.impl
-
Implementation of
ISitemapResolver
as per sitemap.xml standard defined at http://www.sitemaps.org/protocol.html. - GenericSitemapResolver() - Constructor for class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- GenericURLNormalizer - Class in com.norconex.collector.http.url.impl
-
Generic implementation of
IURLNormalizer
that should satisfy most URL normalization needs. - GenericURLNormalizer() - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer
- GenericURLNormalizer.Normalization - Enum in com.norconex.collector.http.url.impl
- GenericURLNormalizer.Replace - Class in com.norconex.collector.http.url.impl
- GET - com.norconex.collector.http.fetch.HttpMethod
- getAcceptedIssuers() - Method in class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- getAllowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets "Allow" filters.
- getApplyTo() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- getArea() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- getArguments() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
-
Gets optional arguments passed to the browser if it supports arguments.
- getAuthConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getBrowser() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getBrowserPath() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getCachedDocInfo() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getCacheDirectory() - Method in class com.norconex.collector.http.processor.impl.ImageCache
- getCanonicalLinkDetector() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the canonical link detector.
- getCapabilities() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getChainedProxy() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
-
Gets chained proxy settings, if any.
- getChangeFrequency(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Gets the sitemap change frequency matching the supplied string.
- getCharset() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Gets the assumed source character encoding.
- getCharset() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the character set of pages on which link extraction is performed.
- getCharset() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Gets the character set of pages on which link extraction is performed.
- getCollectorConfig() - Method in class com.norconex.collector.http.HttpCollector
- getConfig() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- getConfig() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- getConfig() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
- getConfig() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getConfig() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
- getConnectionCharset() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the connection character set.
- getConnectionRequestTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the timeout when requesting a connection, in milliseconds
- getConnectionTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the connection timeout until a connection is established, in milliseconds.
- getContentTypePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getContentTypes() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- getCookieSpec() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getCount() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- getCrawlDelay() - Method in class com.norconex.collector.http.robot.RobotsTxt
- getCrawlDocInfoType() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getCrawler() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
- getCrawler() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getCrawler() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
- getCrawlerConfig() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getCrawlerIds() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- getCrawlState() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getCrawlState() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getCredentials() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- getCssSelector() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- getDayOfMonthRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- getDayOfWeekRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- getDedupDocumentStore() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getDedupMetadataStore() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getDefaultDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets the default delay in milliseconds.
- getDelay() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- getDelay() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- getDelayReferencePatterns() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- getDelayResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- getDepth() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the URL depth.
- getDisallowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets "Disallow" filters.
- getDocInfo() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
- getDocInfo() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getDocInfo() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
- getDomain() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the NTLM authentication domain.
- getDomSelector() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getDriverPath() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getEarlyPageScript() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getEnd() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- getEtag() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the HTTP ETag.
- getException() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getException() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getExePath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getExtraCapability(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.Browser.CustomDriverOptions
- getExtraCapabilityNames() - Method in class com.norconex.collector.http.fetch.impl.webdriver.Browser.CustomDriverOptions
- getExtractBetweens() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the patterns delimiting the portions of a document to be considered for link extraction.
- getExtractSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- getExtractSelectors() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the selectors matching the portions of a document to be considered for link extraction.
- getFallbackCharset() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- getFetchHttpGet() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether to fetch HTTP documents using an HTTP GET request.
- getFetchHttpHead() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether to fetch HTTP response headers using an HTTP HEAD request.
- getFieldMatcher() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
-
Gets field matcher identifying fields holding content used for link extraction.
- getFileNamePrefix() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the generated report file name prefix.
- getFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets all filters.
- getFormCharset() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the authentication form character set.
- getFormParam(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
- getFormParamNames() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets all authentication form parameter names.
- getFormParams() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets all authentication form parameters (equivalent to "input" or other fields in HTML forms).
- getFormPasswordField() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the name of the HTML field where the password is set.
- getFormSelector() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the CSS selelector that identifies the form in a login page.
- getFormUsernameField() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the name of the HTML field where the username is set.
- getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getHeadersPrefix() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- getHost() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the host for the current authentication scope.
- getHost() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
-
Gets the host name passed to the browser pointing to the sniffer proxy.
- getHttpClient() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- getHttpFetchClient() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getHttpFetchClient() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
- getHttpFetchClient() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getHttpFetchers() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets HTTP fetchers.
- getHttpFetchersMaxRetries() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the maximum number of times an HTTP fetcher will re-attempt fetching a resource in case of failures.
- getHttpFetchersRetryDelay() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets how long to wait before a failing HTTP fetcher re-attempts fetching a resource in case of failures (in milliseconds).
- getHttpMethods() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the list of HTTP methods to be accepted by this fetcher.
- getHttpSnifferConfig() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getImage() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- getImage(String) - Method in class com.norconex.collector.http.processor.impl.ImageCache
- getImageCacheDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getImageCacheSize() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getImageFormat() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getImageFormat() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getImplicitlyWait() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getImporter() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getKeepReferencedLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets what type of referenced links to keep, if any.
- getLatePageScript() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getLinkExtractors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets link extractors.
- getLocalAddress() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the local address (IP or hostname).
- getMatch() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- getMaxBufferSize() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- getMaxConnectionIdleTime() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the period of time in milliseconds after which to evict idle connections from the connection pool.
- getMaxConnectionInactiveTime() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
- getMaxConnections() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the maximum number of connections that can be created.
- getMaxConnectionsPerRoute() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the maximum number of connections that can be used per route.
- getMaxDepth() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- getMaxRedirects() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the maximum number of redirects to be followed.
- getMaxURLLength() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the maximum supported URL length.
- getMaxURLLength() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Gets the maximum supported URL length.
- getMetadata() - Method in class com.norconex.collector.http.link.Link
- getMetadata() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getMethod() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the authentication method.
- getMinDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getMinFrequencies() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Gets minimum frequencies.
- getNoExtractBetweens() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the patterns delimiting the portions of a document to be excluded from link extraction.
- getNoExtractSelectors() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- getNoExtractSelectors() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the selectors matching the portions of a document to be excluded from link extraction.
- getNormalizations() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets HTTP status codes to be considered as "Not found" state.
- getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets HTTP status codes to be considered as "Not found" state.
- getOnMatch() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- getOptions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getOriginalReference() - Method in class com.norconex.collector.http.doc.HttpDocInfo
- getOriginalSize() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- getOutputDir() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the local directory where this listener report will be written.
- getPageContentTypePattern() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getPageLoadTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getParser() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Gets the parser to use when creating the DOM-tree.
- getPath() - Method in interface com.norconex.collector.http.robot.IRobotsTxtFilter
- getPattern() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- getPattern() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- getPatternReplacement(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Gets a pattern replacement.
- getPatterns() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- getPort() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- getPostImportLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets a field matcher used to identify post-import metadata fields holding URLs to consider for crawling.
- getPostImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets post-import processors.
- getPreImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets pre-import processors.
- getProxySettings() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getRealm() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the realm name for the current authentication scope.
- getReasonPhrase() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getReasonPhrase() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getRecrawlableResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the recrawlable resolver.
- getRedirectTarget() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getRedirectTarget() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getRedirectTarget(HttpContext) - Static method in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
- getRedirectTrail() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the trail of URLs that were redirected up to this one.
- getRedirectURLProvider() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the redirect URL provider.
- getReferencedUrls() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets URLs referenced by this one.
- getReferenceFilters() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Gets reference filters
- getReferencePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getReferrer() - Method in class com.norconex.collector.http.link.Link
- getReferrerLinkMetadata() - Method in class com.norconex.collector.http.doc.HttpDocInfo
- getReferrerReference() - Method in class com.norconex.collector.http.doc.HttpDocInfo
- getRemoteURL() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getRenderWaitTime() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getReplacement() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- getReplaces() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- getRequestHeader(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the HTTP request header value matching the given name, previously set with
GenericHttpFetcherConfig.setRequestHeader(String, String)
. - getRequestHeaderNames() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets all HTTP request header names for headers previously set with
GenericHttpFetcherConfig.setRequestHeader(String, String)
. - getRequestHeaders() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- getResourceTimeout() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
- getResponses() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getRestrictions() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Gets all restrictions
- getRobotsMeta() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getRobotsMeta(Reader, String, ContentType, Properties) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- getRobotsMeta(Reader, String, ContentType, Properties) - Method in interface com.norconex.collector.http.robot.IRobotsMetaProvider
-
Extracts Robots meta information for a page, if any.
- getRobotsMetaProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- getRobotsTxt(HttpFetchClient, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- getRobotsTxt(HttpFetchClient, String) - Method in interface com.norconex.collector.http.robot.IRobotsTxtProvider
-
Gets robots.txt rules.
- getRobotsTxtProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- getScaleDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getScaleQuality() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getSchedules() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- getSchemes() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Gets the schemes to be extracted.
- getSchemes() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets the schemes to be extracted.
- getScope() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets the delay scope.
- getScreenshotDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getScreenshotHandler() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- getScreenshotImageFormat() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the screenshot image format (jpg, png, gif, bmp, etc.).
- getScreenshotScaleDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the pixel dimensions we want the stored screenshot to have.
- getScreenshotScaleQuality() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the screenshot scaling quality to use when when storage is "disk" or "inline".
- getScreenshotStorage() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the screenshot storage mechanisms.
- getScreenshotStorageDiskDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the directory where screenshots are saved when storage is "disk".
- getScreenshotStorageDiskField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
- getScreenshotStorageDiskStructure() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the screenshot directory structure to create when storage is "disk".
- getScreenshotStorageInlineField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
- getScreenshotZoomFactor() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getScriptPath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getScriptTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getSeparator() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Gets the segment separator pattern
- getSitemapChangeFreq() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the sitemap change frequency.
- getSitemapLastMod() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the sitemap last modified date.
- getSitemapLocations() - Method in class com.norconex.collector.http.robot.RobotsTxt
- getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
-
Gets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
- getSitemapPriority() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the sitemap priority.
- getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
- getSitemapSupport() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Gets the sitemap support strategy.
- getSitemapSupport(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
- getSocketTimeout() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the maximum period of inactivity between two consecutive data packets, in milliseconds.
- getSSLProtocols() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets the supported SSL/TLS protocols.
- getStart() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- getStartSitemapURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets sitemap URLs to be used as starting points for crawling.
- getStartURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets URLs to initiate crawling from.
- getStartURLsFiles() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the file paths of seed files containing URLs to be used as "start URLs".
- getStartURLsProviders() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the providers of URLs used as starting points for crawling.
- getStatusCode() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getStatusCode() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getStatusCodes() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the status codes to listen for.
- getStorage() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
Gets the storage mechanisms.
- getStorageDiskDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getStorageDiskField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getStorageDiskStructure() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getStorageInlineField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getStorageUrlField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- getStreamFactory() - Method in class com.norconex.collector.http.fetch.HttpFetchClient
- getTargetDir() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getTargetDirField() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getTargetDirStructure() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getTargetMetaField() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getTargets() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
-
Gets the directory where temporary sitemap files are written.
- getThreadWait() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getTimeRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- getUrl() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the URL for "form" authentication.
- getUrl() - Method in class com.norconex.collector.http.link.Link
- getUrl() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- getURLCrawlScopeStrategy() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the strategy to use to determine if a URL is in scope.
- getUrlNormalizer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated, for removal: This API element is subject to removal in a future version.Since 3.1.0, use
HttpCrawlerConfig.getUrlNormalizers()
instead. - getUrlNormalizers() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets URL normalizers.
- getUrlRoot() - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Gets the URL root (protocol + domain, e.g. http://www.host.com).
- getUserAgent() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- getUserAgent() - Method in interface com.norconex.collector.http.fetch.IHttpFetcher
- getUserAgent() - Method in interface com.norconex.collector.http.fetch.IHttpFetchResponse
- getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- getUserAgent() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- getValidExitCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets valid PhantomJS exit values (defaults to 0).
- getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- getValue() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- getWaitForElementSelector() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getWaitForElementTimeout() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getWaitForElementType() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getWebDriver() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
-
Gets the web driver associated with the current thread (if any).
- getWindowSize() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- getWorkstation() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets the NTLM authentication workstation name.
H
- handleImage(InputStream, Doc) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- hashCode() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- hashCode() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- hashCode() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- hashCode() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- hashCode() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
- hashCode() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- hashCode() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- hashCode() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- hashCode() - Method in class com.norconex.collector.http.doc.HttpDocInfo
- hashCode() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- hashCode() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- hashCode() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- hashCode() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- hashCode() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- hashCode() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- hashCode() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- hashCode() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- hashCode() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- hashCode() - Method in class com.norconex.collector.http.link.Link
- hashCode() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- hashCode() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- hashCode() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- hashCode() - Method in class com.norconex.collector.http.robot.RobotsMeta
- hashCode() - Method in class com.norconex.collector.http.robot.RobotsTxt
- hashCode() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- hashCode() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- HEAD - com.norconex.collector.http.fetch.HttpMethod
- HIGH - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.
- HIGH - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
- HOURLY - com.norconex.collector.http.sitemap.SitemapChangeFrequency
- HstsResolver - Class in com.norconex.collector.http.fetch.util
-
Class handling HSTS support for servers supporting it.
- HtmlLinkExtractor - Class in com.norconex.collector.http.link.impl
-
Html link extractor for URLs found in HTML and possibly other text files.
- HtmlLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- HtmlLinkExtractor.RegexPair - Class in com.norconex.collector.http.link.impl
- HTTP_FETCHER - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- HttpAuthConfig - Class in com.norconex.collector.http.fetch.impl
-
Generic HTTP Fetcher authentication configuration.
- HttpAuthConfig() - Constructor for class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- HttpCollector - Class in com.norconex.collector.http
-
Main application class.
- HttpCollector() - Constructor for class com.norconex.collector.http.HttpCollector
-
Creates a non-configured HTTP collector.
- HttpCollector(HttpCollectorConfig) - Constructor for class com.norconex.collector.http.HttpCollector
-
Creates and configure an HTTP Collector with the provided configuration.
- HttpCollectorConfig - Class in com.norconex.collector.http
-
HTTP Collector configuration.
- HttpCollectorConfig() - Constructor for class com.norconex.collector.http.HttpCollectorConfig
- HttpCommitterPipeline - Class in com.norconex.collector.http.pipeline.committer
- HttpCommitterPipeline() - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipeline
- HttpCommitterPipelineContext - Class in com.norconex.collector.http.pipeline.committer
- HttpCommitterPipelineContext(HttpCrawler, CrawlDoc) - Constructor for class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
- HttpCrawler - Class in com.norconex.collector.http.crawler
-
The HTTP Crawler.
- HttpCrawler(HttpCrawlerConfig, HttpCollector) - Constructor for class com.norconex.collector.http.crawler.HttpCrawler
-
Constructor.
- HttpCrawlerConfig - Class in com.norconex.collector.http.crawler
-
HTTP Crawler configuration.
- HttpCrawlerConfig() - Constructor for class com.norconex.collector.http.crawler.HttpCrawlerConfig
- HttpCrawlerConfig.HttpMethodSupport - Enum in com.norconex.collector.http.crawler
- HttpCrawlerConfig.ReferencedLinkType - Enum in com.norconex.collector.http.crawler
- HttpCrawlerEvent - Class in com.norconex.collector.http.crawler
-
HTTP Crawler event names.
- HttpCrawlState - Class in com.norconex.collector.http.doc
-
Represents a URL crawling status.
- HttpCrawlState(String) - Constructor for class com.norconex.collector.http.doc.HttpCrawlState
- HttpDocInfo - Class in com.norconex.collector.http.doc
-
A URL being crawled holding relevant crawl information.
- HttpDocInfo() - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
- HttpDocInfo(DocInfo) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
-
Copy constructor.
- HttpDocInfo(String) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
- HttpDocInfo(String, int) - Constructor for class com.norconex.collector.http.doc.HttpDocInfo
-
Constructor.
- HttpDocMetadata - Class in com.norconex.collector.http.doc
-
Metadata constants for common metadata field names typically set by the HTTP Collector crawler.
- HttpFetchClient - Class in com.norconex.collector.http.fetch
-
Fetches HTTP resources, trying all configured http fetchers, defaulting to
GenericHttpFetcher
with default configuration if none are defined. - HttpFetchClient(CachedStreamFactory, List<IHttpFetcher>, int, long) - Constructor for class com.norconex.collector.http.fetch.HttpFetchClient
- HttpFetchClientResponse - Class in com.norconex.collector.http.fetch
-
Hold HTTP response information obtained from fetching a document using HttpFetchClient.
- HttpFetchClientResponse() - Constructor for class com.norconex.collector.http.fetch.HttpFetchClientResponse
- HttpFetchException - Exception in com.norconex.collector.http.fetch
-
Checked exception thrown upon encountering an error performing an HTTP Fetch
- HttpFetchException() - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
- HttpFetchException(String) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
- HttpFetchException(String, Throwable) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
- HttpFetchException(Throwable) - Constructor for exception com.norconex.collector.http.fetch.HttpFetchException
- HttpFetchResponseBuilder - Class in com.norconex.collector.http.fetch
-
Builder facilitating creation of an HTTP fetch response.
- HttpFetchResponseBuilder() - Constructor for class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- HttpFetchResponseBuilder(IHttpFetchResponse) - Constructor for class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- HttpImporterPipeline - Class in com.norconex.collector.http.pipeline.importer
-
All execution steps of a document processing from the moment it is obtained from queue up to importing it.
- HttpImporterPipeline(boolean, boolean) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipeline
- HttpImporterPipelineContext - Class in com.norconex.collector.http.pipeline.importer
- HttpImporterPipelineContext(ImporterPipelineContext) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
Constructor creating a copy of supplied context.
- HttpImporterPipelineContext(HttpCrawler, CrawlDoc) - Constructor for class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- HttpMethod - Enum in com.norconex.collector.http.fetch
- HttpQueuePipeline - Class in com.norconex.collector.http.pipeline.queue
-
Performs a URL handling logic before actual processing of the document it represents takes place.
- HttpQueuePipeline() - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipeline
- HttpQueuePipelineContext - Class in com.norconex.collector.http.pipeline.queue
- HttpQueuePipelineContext(HttpCrawler, HttpDocInfo) - Constructor for class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
- HttpSnifferConfig - Class in com.norconex.collector.http.fetch.impl.webdriver
-
Configuration for
HttpSniffer
. - HttpSnifferConfig() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
I
- ICanonicalLinkDetector - Interface in com.norconex.collector.http.canon
-
Detects and return any canonical URL found in documents, whether from the HTTP headers (metadata), or from a page content (usually HTML).
- ID - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- IDelayResolver - Interface in com.norconex.collector.http.delay
-
Resolves and creates intentional "delays" to increase document download time intervals.
- IHttpDocumentProcessor - Interface in com.norconex.collector.http.processor
-
Custom processing (optional) performed on a document.
- IHttpFetcher - Interface in com.norconex.collector.http.fetch
-
Fetches HTTP resources.
- IHttpFetchResponse - Interface in com.norconex.collector.http.fetch
- ILinkExtractor - Interface in com.norconex.collector.http.link
-
Responsible for finding links in documents.
- ImageCache - Class in com.norconex.collector.http.processor.impl
-
Caches images.
- ImageCache(int, Path) - Constructor for class com.norconex.collector.http.processor.impl.ImageCache
- initCrawlDoc(CrawlDoc) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- INLINE - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Deprecated.
- INLINE - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
- INSCOPE - com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
- IRecrawlableResolver - Interface in com.norconex.collector.http.recrawl
-
Indicates whether a document that was successfully crawled on a previous crawling session should be recrawled or not.
- IRedirectURLProvider - Interface in com.norconex.collector.http.fetch.util
-
Responsible for providing a target absolute URL each time an HTTP redirect is encountered when invoking a URL.
- IRobotsMetaProvider - Interface in com.norconex.collector.http.robot
-
Responsible for extracting robot information from a page.
- IRobotsTxtFilter - Interface in com.norconex.collector.http.robot
-
Holds a robots.txt rule.
- IRobotsTxtProvider - Interface in com.norconex.collector.http.robot
-
Given a URL, extract any "robots.txt" rules.
- is(HttpCrawlerConfig.HttpMethodSupport) - Method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
- is(HttpMethod) - Method in enum com.norconex.collector.http.fetch.HttpMethod
- isAny(HttpMethod...) - Method in enum com.norconex.collector.http.fetch.HttpMethod
- isCaseSensitive() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- isCaseSensitive() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- isCombined() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- isCommentsEnabled() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets whether links should be extracted from HTML/XML comments.
- isCurrentTimeInSchedule() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- isDisabled() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
-
Deprecated.Since 2.0.0, not having a checksummer defined or setting one explicitly to
null
effectively disables it. - isDisabled() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
Whether this URL Normalizer is disabled or not.
- isDisableETag() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether adding "ETag"
If-None-Match
HTTP request header is disabled. - isDisableHSTS() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain
Strict-Transport-Security
policy (obtained from HTTP response header). - isDisableIfModifiedSince() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether adding the
If-Modified-Since
HTTP request header is disabled. - isDisableSNI() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether Server Name Indication (SNI) is disabled.
- isDuplicate() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- isEnabled(HttpCrawlerConfig.HttpMethodSupport) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
- isExpectContinueEnabled() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Whether 'Expect: 100-continue' handshake is enabled.
- isFetchHttpHead() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated.
- isForceCharsetDetection() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether character encoding is detected instead of relying on HTTP response header.
- isForceContentTypeDetection() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets whether content type is detected instead of relying on HTTP response header.
- isIgnoreCanonicalLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
- isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Gets whether to ignore extra data associated with a link.
- isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Gets whether to ignore extra data associated with a link.
- isIgnoreLinkData() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
-
Gets whether to ignore extra data associated with a link.
- isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- isIgnoreNofollow() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- isIgnoreRobotsCrawlDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets whether to ignore crawl delays specified in a site robots.txt file.
- isIgnoreRobotsMeta() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- isIgnoreRobotsTxt() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- isIgnoreSitemap() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Whether to ignore sitemap detection and resolving for URLs processed.
- isIncludeSubdomains() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether sub-domains are considered to be the same as a URL domain.
- isInScope(String, String) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
- ISitemapResolver - Interface in com.norconex.collector.http.sitemap
-
Given a URL root, resolve the corresponding sitemap(s), if any, and only if it has not yet been resolved for a crawling session.
- isKeepDownloads() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- isKeepOutOfScopeLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated.Since 3.0.0, use
HttpCrawlerConfig.getKeepReferencedLinks()
. - isLargest() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- isLenient() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- isNofollow() - Method in class com.norconex.collector.http.robot.RobotsMeta
- isNoindex() - Method in class com.norconex.collector.http.robot.RobotsMeta
- isPostImportLinksKeep() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether to keep the importer-generated field holding URLs to consider for crawling.
- isPreemptive() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Gets whether to perform preemptive authentication (valid for "basic" authentication method).
- isQueueInitialized() - Method in class com.norconex.collector.http.crawler.HttpCrawler
- isRecrawlable(HttpDocInfo) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- isRecrawlable(HttpDocInfo) - Method in interface com.norconex.collector.http.recrawl.IRecrawlableResolver
-
Whether a document recrawlable or not.
- isRedirected(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
- isScaleStretch() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- isScreenshotEnabled() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets whether to enable taking screenshot of crawled web pages.
- isScreenshotScaleStretch() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets whether the screenshot should be stretch to to fill all the scale dimensions.
- isStartURLsAsync() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether the start URLs should be loaded asynchronously.
- isStayOnDomain() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
- isStayOnPort() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
- isStayOnProtocol() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
- IStartURLsProvider - Interface in com.norconex.collector.http.crawler
-
Provide starting URLs for crawling.
- isTimestamped() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets whether to add a timestamp to the file name, to ensure a new one is created with each run.
- isTrustAllSSLCertificates() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Whether to trust all SSL certificates (affects only "https" connections).
- IURLNormalizer - Interface in com.norconex.collector.http.url
-
Responsible for normalizing URLs.
L
- LAST - com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
- LastModifiedMetadataChecksummer - Class in com.norconex.collector.http.checksum.impl
-
Default implementation of
IMetadataChecksummer
for the Norconex HTTP Collector which simply returns the exact value of the "Last-Modified" HTTP header field, ornull
if not present. - LastModifiedMetadataChecksummer() - Constructor for class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- Link - Class in com.norconex.collector.http.link
-
Represents a link extracted from a document.
- Link(String) - Constructor for class com.norconex.collector.http.link.Link
- LINKTEXT - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- loadChecksummerFromXML(XML) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- loadCollectorConfigFromXML(XML) - Method in class com.norconex.collector.http.HttpCollectorConfig
- loadCrawlerConfigFromXML(XML) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Loads explicit configuration of delays form XML.
- loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- loadDelaysFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- loadFromXML(XML) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- loadFromXML(XML) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- loadFromXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- loadFromXML(XML) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- loadFromXML(XML) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- loadFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- loadFromXML(XML) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- loadFromXML(XML) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- loadFromXML(XML) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- loadFromXML(XML) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- loadFromXML(XML) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- loadHttpFetcherFromXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Loads configuration settings specific to the implementing class.
- loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- loadLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
-
Loads configuration settings specific to the implementing class.
- loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- loadTextLinkExtractorFromXML(XML) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- LOW - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.
- LOW - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
- lowerCase - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- lowerCasePath - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- lowerCaseQuery - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- lowerCaseQueryParameterNames - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- lowerCaseQueryParameterValues - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- lowerCaseSchemeHost - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
M
- main(String[]) - Static method in class com.norconex.collector.http.HttpCollector
-
Invokes the HTTP Collector from the command line.
- markReferenceVariationsAsProcessed(CrawlDocInfo) - Method in class com.norconex.collector.http.crawler.HttpCrawler
- matches(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- MAX - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.
- MAX - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
- MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- MAX_BUFFER_SIZE - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- MAXDEPTH - com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
- MEDIUM - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.
- MEDIUM - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
- METADATA - com.norconex.collector.http.fetch.util.DocImageHandler.Target
- METHOD_BASIC - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
BASIC authentication method.
- METHOD_DIGEST - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
DIGEST authentication method.
- METHOD_FORM - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Form-based authentication method.
- METHOD_KERBEROS - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Experimental: Kerberos authentication method.
- METHOD_NTLM - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
NTLM authentication method.
- METHOD_SPNEGO - Static variable in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Experimental: SPNEGO authentication method.
- MinFrequency() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- MinFrequency(String, String, String) - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- mon - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
- MONTHLY - com.norconex.collector.http.sitemap.SitemapChangeFrequency
N
- NAME - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- NEVER - com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
- NEVER - com.norconex.collector.http.sitemap.SitemapChangeFrequency
- normalizeURL(String) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- normalizeURL(String) - Method in interface com.norconex.collector.http.url.IURLNormalizer
-
Normalize the given URL.
- normalizeURL(String, List<IURLNormalizer>) - Static method in interface com.norconex.collector.http.url.IURLNormalizer
-
Normalizes a URL by applying each normalizers in the list.
O
- of(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
- onCrawlerCleanBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- onCrawlerEvent(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- onCrawlerRunBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- onCrawlerStopBegin(CrawlerEvent) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- OPERA - com.norconex.collector.http.fetch.impl.webdriver.Browser
- OPTIONAL - com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
- ORIGINAL_REFERENCE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- OUTSCOPE - com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
- OVERLAP_SIZE - Static variable in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- OVERLAP_SIZE - Static variable in class com.norconex.collector.http.link.impl.RegexLinkExtractor
P
- parseRobotsTxt(InputStream, String, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- PARTIALLINKTEXT - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- PhantomJSDocumentFetcher - Class in com.norconex.collector.http.fetch.impl
-
Deprecated.Since 3.0.0 use
WebDriverHttpFetcher
- PhantomJSDocumentFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- PhantomJSDocumentFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- PhantomJSDocumentFetcher.Quality - Enum in com.norconex.collector.http.fetch.impl
-
Deprecated.
- PhantomJSDocumentFetcher.Storage - Enum in com.norconex.collector.http.fetch.impl
-
Deprecated.
- PhantomJSDocumentFetcher.StorageDiskStructure - Enum in com.norconex.collector.http.fetch.impl
-
Deprecated.
- POST - com.norconex.collector.http.fetch.HttpMethod
- processDocument(HttpFetchClient, Doc) - Method in interface com.norconex.collector.http.processor.IHttpDocumentProcessor
-
Processes a document.
- processDocument(HttpFetchClient, Doc) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- provideRedirectURL(HttpRequest, HttpResponse, HttpContext) - Method in interface com.norconex.collector.http.fetch.util.IRedirectURLProvider
-
Provides the redirect URL that the crawler must follow.
- provideStartURLs() - Method in interface com.norconex.collector.http.crawler.IStartURLsProvider
-
Provides an iterator over start URLs.
R
- REDIRECT - Static variable in class com.norconex.collector.http.doc.HttpCrawlState
- REDIRECT_TRAIL - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- REFERENCED_URLS - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- REFERENCED_URLS_OUT_OF_SCOPE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- ReferenceDelayResolver - Class in com.norconex.collector.http.delay.impl
-
Introduces different delays between document downloads based on matching document reference (URL) patterns.
- ReferenceDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- ReferenceDelayResolver.DelayReferencePattern - Class in com.norconex.collector.http.delay.impl
- REFERRER_LINK_PREFIX - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- REFERRER_REFERENCE - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- RegexLinkExtractor - Class in com.norconex.collector.http.link.impl
-
Link extractor using regular expressions to extract links found in text documents.
- RegexLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.RegexLinkExtractor
- RegexPair(String, String, boolean) - Constructor for class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- REJECTED_NONCANONICAL - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- REJECTED_REDIRECTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- REJECTED_ROBOTS_META_NOINDEX - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- REJECTED_ROBOTS_TXT - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- REJECTED_TOO_DEEP - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- removeDefaultPort - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeDirectoryIndex - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeDotSegments - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeDuplicateSlashes - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeEmptyParameters - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeFormParameter(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Remove the authentication form parameter matching the given name.
- removeFragment - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeLinkSelector(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- removeLinkTag(String, String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- removeQueryString - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeRequestHeader(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Remove the request header matching the given name.
- removeRestriction(PropertyMatcher) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Removes a restriction.
- removeRestriction(String) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Removes all restrictions on a given field.
- removeSessionIds - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeTrailingFragment - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeTrailingHash - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeTrailingQuestionMark - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeTrailingSlash - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- removeWWW - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- Replace(String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- Replace(String, String) - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- replaceIPWithDomainName - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- REQUIRED - com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
- resolve(HttpClient, HttpDocInfo) - Static method in class com.norconex.collector.http.fetch.util.HstsResolver
- resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Resolves explicitly specified delay, in milliseconds.
- resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- resolveExplicitDelay(String) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- resolveSitemaps(HttpFetchClient, String, List<String>, Consumer<HttpDocInfo>, boolean) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- resolveSitemaps(HttpFetchClient, String, List<String>, Consumer<HttpDocInfo>, boolean) - Method in interface com.norconex.collector.http.sitemap.ISitemapResolver
-
Resolves the sitemap instructions for a URL "root" (e.g.
- RobotsMeta - Class in com.norconex.collector.http.robot
- RobotsMeta(boolean, boolean) - Constructor for class com.norconex.collector.http.robot.RobotsMeta
- RobotsTxt - Class in com.norconex.collector.http.robot
- RobotsTxt(IRobotsTxtFilter...) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
-
Creates a new robot txt object with the supplied filters.
- RobotsTxt(List<IRobotsTxtFilter>) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
-
Creates a new robot txt object with the supplied filters.
- RobotsTxt(List<IRobotsTxtFilter>, float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
- RobotsTxt(List<IRobotsTxtFilter>, List<String>) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
- RobotsTxt(List<IRobotsTxtFilter>, List<String>, float) - Constructor for class com.norconex.collector.http.robot.RobotsTxt
S
- SAFARI - com.norconex.collector.http.fetch.impl.webdriver.Browser
- sat - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
- saveChecksummerToXML(XML) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- saveCollectorConfigToXML(XML) - Method in class com.norconex.collector.http.HttpCollectorConfig
- saveCrawlerConfigToXML(XML) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Saves explicit configuration of delays to XML.
- saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- saveDelaysToXML(XML) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- saveHttpFetcherToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Saves configuration settings specific to the implementing class.
- saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- saveLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
-
Saves configuration settings specific to the implementing class.
- saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- saveTextLinkExtractorToXML(XML) - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- saveToXML(XML) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- saveToXML(XML) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- saveToXML(XML) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- saveToXML(XML) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- saveToXML(XML) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- saveToXML(XML) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- saveToXML(XML) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- saveToXML(XML) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- saveToXML(XML) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- saveToXML(XML) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- saveToXML(XML) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- ScaledImage - Class in com.norconex.collector.http.processor.impl
- ScaledImage(String, Dimension, BufferedImage) - Constructor for class com.norconex.collector.http.processor.impl.ScaledImage
- SCOPE_CRAWLER - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- SCOPE_SITE - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- SCOPE_THREAD - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- ScreenshotHandler - Class in com.norconex.collector.http.fetch.impl.webdriver
-
Takes screenshot of pages using a Selenium
WebDriver
. - ScreenshotHandler() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- ScreenshotHandler(CachedStreamFactory) - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- secureScheme - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- SegmentCountURLFilter - Class in com.norconex.collector.http.filter.impl
-
Filters URL based based on the number of URL segments.
- SegmentCountURLFilter() - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int, OnMatch) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int, OnMatch, boolean) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- setApplyTo(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- setAuthConfig(HttpAuthConfig) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- setBrowser(Browser) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setBrowserPath(Path) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setCanonicalLinkDetector(ICanonicalLinkDetector) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the canonical link detector.
- setCaseSensitive(boolean) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- setChainedProxy(ProxySettings) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
-
Sets chained proxy settings, if any.
- setCharset(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Sets the assumed source character encoding.
- setCharset(String) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the character set of pages on which link extraction is performed.
- setCharset(String) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Sets the character set of pages on which link extraction is performed.
- setCombined(boolean) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- setCommentsEnabled(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets whether links should be extracted from HTML/XML comments.
- setConnectionCharset(Charset) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the connection character set.
- setConnectionRequestTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the timeout when requesting a connection, in milliseconds.
- setConnectionTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the connection timeout until a connection is established, in milliseconds.
- setContentTypePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setContentTypes(ContentType...) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
-
Sets the content types on which to perform canonical link detection.
- setContentTypes(List<ContentType>) - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
-
Sets the content types on which to perform canonical link detection.
- setCookieSpec(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- setCount(int) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- setCrawlerIds(List<String>) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- setCrawlState(CrawlState) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setCredentials(Credentials) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- setCssSelector(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- setDefaultDelay(long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets the default delay in milliseconds.
- setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern>) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- setDelayResolver(IDelayResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setDepth(int) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the URL depth.
- setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setDisabled(boolean) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
-
Deprecated.Since 2.0.0, not having a checksummer defined or setting one explicitly to
null
effectively disable it. - setDisabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
Sets whether this URL Normalizer is disabled or not.
- setDisableETag(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether whether adding "ETag"
If-None-Match
HTTP request header is disabled. - setDisableHSTS(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether the forcing of non secure URLs to secure ones is disabled, according to the URL domain
Strict-Transport-Security
policy (obtained from HTTP response header). - setDisableIfModifiedSince(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether adding the
If-Modified-Since
HTTP request header is disabled. - setDisableSNI(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether Server Name Indication (SNI) is disabled.
- setDomain(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the NTLM authentication domain
- setDomSelector(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setDriverPath(Path) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setDuplicate(boolean) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- setEarlyPageScript(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setEtag(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the HTTP ETag.
- setException(Exception) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setExePath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setExpectContinueEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether 'Expect: 100-continue' handshake is enabled.
- setExtractBetweens(HtmlLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the patterns delimiting the portions of a document to be considered for link extraction.
- setExtractBetweens(List<HtmlLinkExtractor.RegexPair>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the patterns delimiting the portions of a document to be considered for link extraction.
- setExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- setExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the selectors matching the portions of a document to be considered for link extraction.
- setExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- setExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the selectors matching the portions of a document to be considered for link extraction.
- setFallbackCharset(String) - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- setFetchHttpGet(HttpCrawlerConfig.HttpMethodSupport) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to fetch HTTP documents using an HTTP GET request.
- setFetchHttpHead(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated.
- setFetchHttpHead(HttpCrawlerConfig.HttpMethodSupport) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to fetch HTTP response headers using an HTTP HEAD request.
- setFieldMatcher(TextMatcher) - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
-
Gets field matcher identifying fields holding content used for link extraction.
- setFileNamePrefix(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets the generated report file name prefix.
- setForceCharsetDetection(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether character encoding is detected instead of relying on HTTP response header.
- setForceContentTypeDetection(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether content type is detected instead of relying on HTTP response header.
- setFormCharset(Charset) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the authentication form character set for the form field values.
- setFormParam(String, String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets an authentication form parameter (equivalent to "input" or other fields in HTML forms).
- setFormParams(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets authentication form parameters (equivalent to "input" or other fields in HTML forms).
- setFormPasswordField(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the name of the HTML field where the password is set.
- setFormSelector(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the CSS selelector that identifies the form in a login page.
- setFormUsernameField(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the name of the HTML field where the username is set.
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- setHost(Host) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the host for the current authentication scope.
- setHost(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
-
Sets the host name passed to the browser pointing to the sniffer proxy.
- setHttpFetchers(IHttpFetcher...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets HTTP fetchers.
- setHttpFetchers(List<IHttpFetcher>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets HTTP fetchers.
- setHttpFetchersMaxRetries(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the maximum number of times an HTTP fetcher will re-attempt fetching a resource in case of failures.
- setHttpFetchersRetryDelay(long) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets how long to wait before a failing HTTP fetcher re-attempts fetching a resource in case of failures (in milliseconds).
- setHttpMethods(List<HttpMethod>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the list of HTTP methods to be accepted by this fetcher.
- setHttpSnifferConfig(HttpSnifferConfig) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setIgnoreCanonicalLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether canonical links found in HTTP headers and in HTML files <head> section should be ignored or processed.
- setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Sets whether to ignore extra data associated with a link.
- setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets whether to ignore extra data associated with a link.
- setIgnoreLinkData(boolean) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
-
Sets whether to ignore extra data associated with a link.
- setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- setIgnoreRobotsCrawlDelay(boolean) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets whether to ignore crawl delays specified in a site robots.txt file.
- setIgnoreRobotsMeta(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setIgnoreRobotsTxt(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setIgnoreSitemap(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to ignore sitemap detection and resolving for URLs processed.
- setImage(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ImageCache
- setImageCacheDir(Path) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setImageCacheSize(int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setImageFormat(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setImageFormat(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setImplicitlyWait(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setIncludeSubdomains(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether sub-domains are considered to be the same as a URL domain.
- setKeepDownloads(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setKeepOutOfScopeLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated.Since 3.0.0, use
HttpCrawlerConfig.setKeepReferencedLinks(Set)
. - setKeepReferencedLinks(HttpCrawlerConfig.ReferencedLinkType...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to keep referenced links and what to keep.
- setKeepReferencedLinks(Set<HttpCrawlerConfig.ReferencedLinkType>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to keep referenced links and what to keep.
- setLargest(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setLatePageScript(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- setLinkExtractors(ILinkExtractor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets link extractors.
- setLinkExtractors(List<ILinkExtractor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets link extractors.
- setLocalAddress(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the local address, which may be useful when working with multiple network interfaces.
- setMaxBufferSize(int) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- setMaxConnectionIdleTime(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the period of time in milliseconds after which to evict idle connections from the connection pool.
- setMaxConnectionInactiveTime(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the period of time in milliseconds a connection must be inactive to be checked in case it became stalled.
- setMaxConnections(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets maximum number of connections that can be created.
- setMaxConnectionsPerRoute(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the maximum number of connections that can be used per route.
- setMaxDepth(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setMaxRedirects(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the maximum number of redirects to be followed.
- setMaxURLLength(int) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the maximum supported URL length.
- setMaxURLLength(int) - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
-
Sets the maximum supported URL length.
- setMetadata(Properties) - Method in class com.norconex.collector.http.link.Link
- setMethod(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the authentication method.
- setMinDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setMinDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setMinFrequencies(GenericRecrawlableResolver.MinFrequency...) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Sets minimum frequencies.
- setMinFrequencies(Collection<GenericRecrawlableResolver.MinFrequency>) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Sets minimum frequencies.
- setNoExtractBetweens(HtmlLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the patterns delimiting the portions of a document to be excluded from link extraction.
- setNoExtractBetweens(List<HtmlLinkExtractor.RegexPair>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the patterns delimiting the portions of a document to be excluded from link extraction.
- setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the selectors matching the portions of a document to be excluded from link extraction.
- setNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- setNoExtractSelectors(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the selectors matching the portions of a document to be excluded from link extraction.
- setNormalizations(GenericURLNormalizer.Normalization...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- setNormalizations(List<GenericURLNormalizer.Normalization>) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets HTTP status codes to be considered as "Not found" state.
- setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets HTTP status codes to be considered as "Not found" state.
- setNotFoundStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets HTTP status codes to be considered as "Not found" state.
- setNotFoundStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets HTTP status codes to be considered as "Not found" state.
- setOnMatch(OnMatch) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- setOptions(String...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets optional extra PhantomJS command-line options.
- setOptions(List<String>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets optional extra PhantomJS command-line options.
- setOriginalReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
- setOutputDir(Path) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets the local directory where this listener report will be written.
- setPageContentTypePattern(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setPageLoadTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setParser(String) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Sets the parser to use when creating the DOM-tree.
- setPattern(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- setPort(int) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- setPostImportLinks(TextMatcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Set a field matcher used to identify post-import metadata fields holding URLs to consider for crawling.
- setPostImportLinksKeep(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to keep the importer-generated field holding URLs to consider for crawling.
- setPostImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets post-import processors.
- setPostImportProcessors(List<IHttpDocumentProcessor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets post-import processors.
- setPreemptive(boolean) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets whether to perform preemptive authentication (valid for "basic" authentication method).
- setPreImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets pre-import processors.
- setPreImportProcessors(List<IHttpDocumentProcessor>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets pre-import processors.
- setProxySettings(ProxySettings) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- setRealm(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the realm name for the current authentication scope.
- setReasonPhrase(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setRecrawlableResolver(IRecrawlableResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the recrawlable resolver.
- setRedirectTarget(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setRedirectTrail(List<String>) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the trail of URLs that were redirected up to this one.
- setRedirectURLProvider(IRedirectURLProvider) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the redirect URL provider
- setReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
- setReferencedUrls(List<String>) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets URLs referenced by this one.
- setReferenceFilters(IReferenceFilter...) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Sets reference filters.
- setReferenceFilters(List<IReferenceFilter>) - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
-
Sets reference filters.
- setReferencePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setReferrer(String) - Method in class com.norconex.collector.http.link.Link
- setReferrerLinkMetadata(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
- setReferrerReference(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
- setRemoteURL(URL) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setRenderWaitTime(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setReplaces(GenericURLNormalizer.Replace...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- setReplaces(List<GenericURLNormalizer.Replace>) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- setRequestHeader(String, String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets a default HTTP request header every HTTP connection should have.
- setRequestHeaders(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets a default HTTP request headers every HTTP connection should have.
- setRequestHeaders(Map<String, String>) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- setRequestIfModifiedSince(HttpRequest, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Sets the
If-Modified-Since
HTTP request header based on document cached last crawled date (if any). - setRequestIfNoneMatch(HttpRequest, CrawlDoc) - Static method in class com.norconex.collector.http.fetch.util.ApacheHttpUtil
-
Sets the ETag
If-None-Match
HTTP request header based on document cached ETag value (if any). - setResourceTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
- setRestrictions(List<PropertyMatcher>) - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
-
Sets restrictions this extractor should be restricted to.
- setRobotsMeta(RobotsMeta) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
- setRobotsMetaProvider(IRobotsMetaProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setRobotsTxtProvider(IRobotsTxtProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setScaleDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setScaleDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setScaleQuality(FeaturedImageProcessor.Quality) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setScaleStretch(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setSchedules(List<GenericDelayResolver.DelaySchedule>) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- setSchemes(String...) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Sets the schemes to be extracted.
- setSchemes(String...) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the schemes to be extracted.
- setSchemes(List<String>) - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
-
Sets the schemes to be extracted.
- setSchemes(List<String>) - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
-
Sets the schemes to be extracted.
- setScope(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets the delay scope.
- setScreenshotDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setScreenshotDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setScreenshotEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets whether to enable taking screenshot of crawled web pages.
- setScreenshotHandler(ScreenshotHandler) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- setScreenshotImageFormat(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the screenshot image format (jpg, png, gif, bmp, etc.).
- setScreenshotScaleDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.
- setScreenshotScaleDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.
- setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the screenshot scaling quality to use when when storage is "disk" or "inline".
- setScreenshotScaleStretch(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets whether the screenshot should be stretch to to fill all the scale dimensions.
- setScreenshotStorage(PhantomJSDocumentFetcher.Storage...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the screenshot storage mechanisms.
- setScreenshotStorage(List<PhantomJSDocumentFetcher.Storage>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the screenshot storage mechanisms.
- setScreenshotStorageDiskDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the directory where screenshots are saved when storage is "disk".
- setScreenshotStorageDiskField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
- setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the screenshot directory structure to create when storage is "disk".
- setScreenshotStorageInlineField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
- setScreenshotZoomFactor(float) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setScriptPath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- setScriptTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setSeparator(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the sitemap change frequency.
- setSitemapLastMod(ZonedDateTime) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the sitemap last modified date.
- setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
-
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
- setSitemapPaths(List<String>) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
-
Sets the URL paths, relative to the URL root, from which to try locate and resolve sitemaps.
- setSitemapPriority(Float) - Method in class com.norconex.collector.http.doc.HttpDocInfo
-
Sets the sitemap priority.
- setSitemapResolver(ISitemapResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- setSitemapSupport(GenericRecrawlableResolver.SitemapSupport) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Sets the sitemap support strategy.
- setSocketTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the maximum period of inactivity between two consecutive data packets, in milliseconds.
- setSSLProtocols(List<String>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1, and TLSv1.2.
- setStartSitemapURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the sitemap URLs used as starting points for crawling.
- setStartSitemapURLs(List<String>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the sitemap URLs used as starting points for crawling.
- setStartURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets URLs to initiate crawling from.
- setStartURLs(List<String>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets URLs to initiate crawling from.
- setStartURLsAsync(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether the start URLs should be loaded asynchronously.
- setStartURLsFiles(Path...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the file paths of seed files containing URLs to be used as "start URLs".
- setStartURLsFiles(List<Path>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the file paths of seed files containing URLs to be used as "start URLs".
- setStartURLsProviders(IStartURLsProvider...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the providers of URLs used as starting points for crawling.
- setStartURLsProviders(List<IStartURLsProvider>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the providers of URLs used as starting points for crawling.
- setStatusCode(int) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setStatusCodes(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets a coma-separated list of status codes to listen to.
- setStayOnDomain(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same domain name as the domain for each URL specified as a start URL.
- setStayOnPort(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same port as the port for each URL specified as a start URL.
- setStayOnProtocol(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same protocol as the protocol for each URL specified as a start URL.
- setStorage(FeaturedImageProcessor.Storage...) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
Sets the storage mechanisms.
- setStorage(List<FeaturedImageProcessor.Storage>) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
Sets the storage mechanisms.
- setStorageDiskDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setStorageDiskField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setStorageDiskStructure(FeaturedImageProcessor.StorageDiskStructure) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setStorageInlineField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setStorageUrlField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- setTargetDir(Path) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTargetDirField(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTargetDirStructure(DocImageHandler.DirStructure) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTargetMetaField(String) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTargets(DocImageHandler.Target...) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTargets(List<DocImageHandler.Target>) - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- setTempDir(Path) - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
-
Sets the directory where temporary sitemap files are written.
- setThreadWait(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setTimestamped(boolean) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets whether to add a timestamp to the file name, to ensure a new one is created with each run.
- setTrustAllSSLCertificates(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Sets whether to trust all SSL certificate.
- setUrl(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the URL for "form" authentication.
- setUrlCrawlScopeStrategy(URLCrawlScopeStrategy) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the strategy to use to determine if a URL is in scope.
- setUrlNormalizer(IURLNormalizer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Deprecated, for removal: This API element is subject to removal in a future version.Since 3.1.0, use
HttpCrawlerConfig.setUrlNormalizers(List)
instead. - setUrlNormalizers(List<IURLNormalizer>) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets URL normalizers.
- setUserAgent(String) - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- setUserAgent(String) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- setUserAgent(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- setValidExitCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets valid PhantomJS exit values (defaults to 0).
- setValidExitCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Sets valid PhantomJS exit values (defaults to 0).
- setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets valid HTTP response status codes.
- setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets valid HTTP response status codes.
- setValidStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
-
Gets valid HTTP response status codes.
- setValidStatusCodes(List<Integer>) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.Gets valid HTTP response status codes.
- setValue(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- setWaitForElementSelector(String) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setWaitForElementTimeout(long) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setWaitForElementType(WebDriverHttpFetcherConfig.WaitElementType) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setWindowSize(Dimension) - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- setWorkstation(String) - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
-
Sets the NTLM authentication workstation name.
- SiteDelay - Class in com.norconex.collector.http.delay.impl
- SiteDelay() - Constructor for class com.norconex.collector.http.delay.impl.SiteDelay
- SitemapChangeFrequency - Enum in com.norconex.collector.http.sitemap
-
Sitemap change frequency unit, as defined on http://www.sitemaps.org/protocol.html
- SM_CHANGE_FREQ - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- SM_LASTMOD - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- SM_PRORITY - Static variable in class com.norconex.collector.http.doc.HttpDocMetadata
- sortQueryParameters - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- StandardRobotsMetaProvider - Class in com.norconex.collector.http.robot.impl
-
Implementation of
IRobotsMetaProvider
as per X-Robots-Tag and ROBOTS standards. - StandardRobotsMetaProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- StandardRobotsTxtProvider - Class in com.norconex.collector.http.robot.impl
-
Implementation of
IRobotsTxtProvider
as per the robots.txt standard described at http://www.robotstxt.org/robotstxt.html. - StandardRobotsTxtProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- sun - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
T
- TAGNAME - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
- takeScreenshot(WebDriver, Doc) - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- TARGET_REDIRECT_CONTEXT_KEY - Static variable in class com.norconex.collector.http.fetch.util.ApacheRedirectCaptureStrategy
- ThreadDelay - Class in com.norconex.collector.http.delay.impl
- ThreadDelay() - Constructor for class com.norconex.collector.http.delay.impl.ThreadDelay
- thu - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
- TikaLinkExtractor - Class in com.norconex.collector.http.link.impl
-
Implementation of
ILinkExtractor
using Apache Tika to perform URL extractions from HTML documents. - TikaLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.TikaLinkExtractor
- TINY_SLEEP_MS - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelay
- toHTMLInlineString(String) - Method in class com.norconex.collector.http.processor.impl.ScaledImage
- TOO_DEEP - Static variable in class com.norconex.collector.http.doc.HttpCrawlState
- toString() - Method in class com.norconex.collector.http.canon.impl.GenericCanonicalLinkDetector
- toString() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
- toString() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
- toString() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
- toString() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
- toString() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
- toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
- toString() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
- toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
- toString() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
- toString() - Method in class com.norconex.collector.http.doc.HttpDocInfo
- toString() - Method in class com.norconex.collector.http.fetch.AbstractHttpFetcher
- toString() - Method in class com.norconex.collector.http.fetch.HttpFetchClientResponse
- toString() - Method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- toString() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcher
- toString() - Method in class com.norconex.collector.http.fetch.impl.GenericHttpFetcherConfig
- toString() - Method in class com.norconex.collector.http.fetch.impl.HttpAuthConfig
- toString() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Deprecated.
- toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.HttpSnifferConfig
- toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.ScreenshotHandler
- toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
- toString() - Method in class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- toString() - Method in class com.norconex.collector.http.fetch.util.DocImageHandler
- toString() - Method in class com.norconex.collector.http.fetch.util.GenericRedirectURLProvider
- toString() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
- toString() - Method in class com.norconex.collector.http.link.AbstractLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.AbstractTextLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.impl.DOMLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor.RegexPair
- toString() - Method in class com.norconex.collector.http.link.impl.HtmlLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.impl.RegexLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.impl.TikaLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- toString() - Method in class com.norconex.collector.http.link.Link
- toString() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
- toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
- toString() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
- toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
- toString() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
- toString() - Method in class com.norconex.collector.http.robot.RobotsMeta
- toString() - Method in class com.norconex.collector.http.robot.RobotsTxt
- toString() - Method in class com.norconex.collector.http.sitemap.impl.GenericSitemapResolver
- toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
- toString() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
- TrustAllX509TrustManager - Class in com.norconex.collector.http.fetch.util
-
A very unsafe trust manager accepting ALL certificates.
- TrustAllX509TrustManager() - Constructor for class com.norconex.collector.http.fetch.util.TrustAllX509TrustManager
- tue - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
U
- unsecureScheme - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- UNSPECIFIED_CRAWL_DELAY - Static variable in class com.norconex.collector.http.robot.RobotsTxt
- unsupported() - Static method in class com.norconex.collector.http.fetch.HttpFetchResponseBuilder
- upperCaseEscapeSequence - com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
- URL - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
- URL2PATH - com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Deprecated.
- URL2PATH - com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
- URL2PATH - com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
- URLCrawlScopeStrategy - Class in com.norconex.collector.http.crawler
-
By default a crawler will try to follow all links it discovers.
- URLCrawlScopeStrategy() - Constructor for class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
- URLS_EXTRACTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- URLS_POST_IMPORTED - Static variable in class com.norconex.collector.http.crawler.HttpCrawlerEvent
- URLStatusCrawlerEventListener - Class in com.norconex.collector.http.crawler.event.impl
-
Store on file all URLs that were "fetched", along with their HTTP response code.
- URLStatusCrawlerEventListener() - Constructor for class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
V
- valueOf(String) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.HttpMethod
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Deprecated.Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Deprecated.Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.Target
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.HttpMethodSupport
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.crawler.HttpCrawlerConfig.ReferencedLinkType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.HttpMethod
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Deprecated.Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Deprecated.Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Deprecated.Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.Browser
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.DirStructure
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.util.DocImageHandler.Target
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
-
Returns an array containing the constants of this enum type, in the order they are declared.
W
- WebDriverBuilder() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.Browser.WebDriverBuilder
- WebDriverHttpFetcher - Class in com.norconex.collector.http.fetch.impl.webdriver
-
Uses Selenium WebDriver support for using native browsers to crawl documents.
- WebDriverHttpFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
-
Creates a new WebDriver HTTP Fetcher defaulting to Firefox.
- WebDriverHttpFetcher(WebDriverHttpFetcherConfig) - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher
-
Creates a new WebDriver HTTP Fetcher for the supplied configuration.
- WebDriverHttpFetcherConfig - Class in com.norconex.collector.http.fetch.impl.webdriver
-
Configuration for
WebDriverHttpFetcher
. - WebDriverHttpFetcherConfig() - Constructor for class com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig
- WebDriverHttpFetcherConfig.WaitElementType - Enum in com.norconex.collector.http.fetch.impl.webdriver
- wed - com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
- WEEKLY - com.norconex.collector.http.sitemap.SitemapChangeFrequency
X
- XMLFeedLinkExtractor - Class in com.norconex.collector.http.link.impl
- XMLFeedLinkExtractor() - Constructor for class com.norconex.collector.http.link.impl.XMLFeedLinkExtractor
- XPATH - com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcherConfig.WaitElementType
Y
- YEARLY - com.norconex.collector.http.sitemap.SitemapChangeFrequency
All Classes All Packages