- GenericCanonicalLinkDetector - Class in com.norconex.collector.http.url.impl
-
Generic canonical link detector.
- GenericCanonicalLinkDetector() - Constructor for class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
-
- GenericDelayResolver - Class in com.norconex.collector.http.delay.impl
-
Default implementation for creating voluntary delays between URL downloads.
- GenericDelayResolver() - Constructor for class com.norconex.collector.http.delay.impl.GenericDelayResolver
-
- GenericDelayResolver.DelaySchedule - Class in com.norconex.collector.http.delay.impl
-
- GenericDelayResolver.DelaySchedule.DOW - Enum in com.norconex.collector.http.delay.impl
-
- GenericDocumentFetcher - Class in com.norconex.collector.http.fetch.impl
-
- GenericDocumentFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- GenericDocumentFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- GenericHttpClientFactory - Class in com.norconex.collector.http.client.impl
-
- GenericHttpClientFactory() - Constructor for class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- GenericLinkExtractor - Class in com.norconex.collector.http.url.impl
-
Generic link extractor for URLs found in HTML and possibly other text files.
- GenericLinkExtractor() - Constructor for class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- GenericLinkExtractor.RegexPair - Class in com.norconex.collector.http.url.impl
-
- GenericMetadataFetcher - Class in com.norconex.collector.http.fetch.impl
-
- GenericMetadataFetcher() - Constructor for class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- GenericMetadataFetcher(int[]) - Constructor for class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- GenericRecrawlableResolver - Class in com.norconex.collector.http.recrawl.impl
-
Relies on both sitemap directives and custom instructions for
establishing the minimum frequency between each document recrawl.
- GenericRecrawlableResolver() - Constructor for class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
- GenericRecrawlableResolver.MinFrequency - Class in com.norconex.collector.http.recrawl.impl
-
- GenericRecrawlableResolver.SitemapSupport - Enum in com.norconex.collector.http.recrawl.impl
-
- GenericRedirectURLProvider - Class in com.norconex.collector.http.redirect.impl
-
Provide redirect URLs by grabbing them from the HTTP Response
Location
header value.
- GenericRedirectURLProvider() - Constructor for class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
-
- GenericURLNormalizer - Class in com.norconex.collector.http.url.impl
-
Generic implementation of
IURLNormalizer
that should satisfy
most URL normalization needs.
- GenericURLNormalizer() - Constructor for class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- GenericURLNormalizer.Normalization - Enum in com.norconex.collector.http.url.impl
-
- GenericURLNormalizer.Replace - Class in com.norconex.collector.http.url.impl
-
- getAllowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets "Allow" filters.
- getApplyTo() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- getApplyToContentTypePattern() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- getApplyToContentTypePattern() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
-
- getApplyToReferencePattern() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- getApplyToReferencePattern() - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
-
- getArea() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
-
- getAuthDomain() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the NTLM authentication domain.
- getAuthFormCharset() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the authentication form character set.
- getAuthFormParam(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets an authentication form parameter (equivalent to "input" or other
fields in HTML forms).
- getAuthFormParamNames() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets all authentication form parameter names.
- getAuthHostname() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the host name for the current authentication scope.
- getAuthMethod() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the authentication method.
- getAuthPassword() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the authentication password.
- getAuthPasswordField() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the name of the HTML field where the password is set.
- getAuthPasswordKey() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the authentication password encryption key.
- getAuthPort() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the port for the current authentication scope.
- getAuthRealm() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the realm name for the current authentication scope.
- getAuthURL() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the URL for "form" authentication.
- getAuthUsername() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the username.
- getAuthUsernameField() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the name of the HTML field where the username is set.
- getAuthWorkstation() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the NTLM authentication workstation name.
- getCachedCrawlData() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getCachedCrawlDataSQL() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getCachedCrawlDataValues(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getCacheDirectory() - Method in class com.norconex.collector.http.processor.impl.ImageCache
-
- getCanonicalLinkDetector() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the canonical link detector.
- getChangeFrequency(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Gets the sitemap change frequency matching the supplied string.
- getCharset() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the character set of pages on which link extraction is performed.
- getCharset() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
Gets the character set of pages on which link extraction is performed.
- getCollectorConfig() - Method in class com.norconex.collector.http.HttpCollector
-
- getConfig() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
-
- getConfig() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getConfig() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
-
- getConnectionCharset() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the connection character set.
- getConnectionRequestTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the timeout when requesting a connection, in milliseconds
- getConnectionTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the connection timeout until a connection is established,
in milliseconds.
- getContentType() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getContentTypePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getContentTypes() - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
-
- getContentTypes() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- getContentTypes() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- getCookieSpec() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- getCount() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- getCrawlData() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
-
- getCrawlData() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getCrawlData() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
-
- getCrawlDate() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getCrawlDelay() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
- getCrawler() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
-
- getCrawler() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getCrawler() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
-
- getCrawlerConfig() - Method in class com.norconex.collector.http.crawler.HttpCrawler
-
- getCrawlState() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
-
- getCreateTableSQLs(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getDayOfMonthRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
-
- getDayOfWeekRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
-
- getDefaultDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets the default delay in milliseconds.
- getDelay() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
-
- getDelay() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
-
- getDelayReferencePatterns() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
-
- getDelayResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getDeleteCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getDeleteCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getDepth() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the URL depth.
- getDisallowFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets "Disallow" filters.
- getDocument() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
-
- getDocument() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getDocumentFetcher() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getDocumentOutOfScopeUrls() - Method in class com.norconex.collector.http.doc.HttpMetadata
-
- getDocumentUrl() - Method in class com.norconex.collector.http.doc.HttpMetadata
-
- getDocumentUrls() - Method in class com.norconex.collector.http.doc.HttpMetadata
-
- getDomSelector() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getEnd() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
-
- getExePath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getExtractBetweens() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the patterns delimiting the portions of a document to be considered
for link extraction.
- getExtractSelectors() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the selectors matching the portions of a document to be considered
for link extraction.
- getFallbackCharset() - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
-
- getFileNamePrefix() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the generated report file name prefix.
- getFilters() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
Gets all filters.
- getFromDate() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Gets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
- getFromDate() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Gets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
- getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- getHeadersPrefix() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getHeadersPrefix() - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- getHttpClient() - Method in class com.norconex.collector.http.crawler.HttpCrawler
-
- getHttpClient() - Method in class com.norconex.collector.http.pipeline.committer.HttpCommitterPipelineContext
-
- getHttpClient() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getHttpClient() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
-
- getHttpClientFactory() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getHttpHeadersFetcher() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getImage(String) - Method in class com.norconex.collector.http.processor.impl.ImageCache
-
- getImage() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
-
- getImageCacheDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getImageCacheSize() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getImageFormat() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getImporter() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getInsertCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getInsertCrawlDataValues(String, ICrawlData) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getLinkExtractors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getLocalAddress() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the local address (IP or hostname).
- getMatch() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
-
- getMaxConnectionIdleTime() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the period of time in milliseconds after which to evict idle
connections from the connection pool.
- getMaxConnectionInactiveTime() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the period of time in milliseconds a connection must be inactive
to be checked in case it became stalled.
- getMaxConnections() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the maximum number of connections that can be created.
- getMaxConnectionsPerRoute() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the maximum number of connections that can be used per route.
- getMaxDepth() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getMaxRedirects() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the maximum number of redirects to be followed.
- getMaxURLLength() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the maximum supported URL length.
- getMaxURLLength() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
Gets the maximum supported URL length.
- getMetadata() - Method in class com.norconex.collector.http.doc.HttpDocument
-
- getMetadata() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getMetadataChecksummer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the metadata checksummer.
- getMetadataFetcher() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getMinDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getMinFrequencies() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
- getNextQueued(MongoCollection<Document>) - Method in class com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer
-
- getNextQueuedCrawlDataSQL() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getNextQueuedCrawlDataValues() - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getNoExtractBetweens() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the patterns delimiting the portions of a document to be excluded
from link extraction.
- getNoExtractSelectors() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the selectors matching the portions of a document to be excluded
from link extraction.
- getNofollowPatterns() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the patterns of references for which link extraction is disabled.
- getNormalizations() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Gets HTTP status codes to be considered as "Not found" state.
- getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
Gets HTTP status codes to be considered as "Not found" state.
- getNotFoundStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getOptions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getOriginalRedirectStrategy() - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
- getOriginalReference() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- getOriginalSize() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
-
- getOutputDir() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the local directory where this listener report will be written.
- getPageContentTypePattern() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getPath() - Method in interface com.norconex.collector.http.robot.IRobotsTxtFilter
-
- getPattern() - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver.DelayReferencePattern
-
- getPattern() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- getPatternMatchGroup(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- getPatternReplacement(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
Gets a pattern replacement.
- getPatterns() - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- getPort() - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
-
- getPostImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getPreImportProcessors() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getProxyHost() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy host.
- getProxyPassword() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy password.
- getProxyPasswordKey() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy password encryption key.
- getProxyPort() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy port.
- getProxyRealm() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy realm.
- getProxyScheme() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy scheme.
- getProxyUsername() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the proxy username.
- getReasonPhrase() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
-
- getRecrawlableResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the recrawlable resolver.
- getRedirect(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
- getRedirectTrail() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the trail of URLs that were redirected up to this one.
- getRedirectURL() - Static method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
- getRedirectURLProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the redirect URL provider.
- getRedirectURLProvider() - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
Gets the redirect URL provider.
- getReference() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getReferencedUrls() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets URLs referenced by this one.
- getReferenceExistsSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getReferenceExistsValues(String, String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getReferencePattern() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getReferrer() - Method in class com.norconex.collector.http.url.Link
-
- getReferrerLinkTag() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- getReferrerLinkText() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- getReferrerLinkTitle() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- getReferrerReference() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- getRenderWaitTime() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getReplacement() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer.Replace
-
- getReplaces() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- getRequestHeader(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- getRequestHeaderNames() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- getRequestHeaders() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- getResourceTimeout() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the milliseconds timeout after which any resource requested will
stop trying and proceed with other parts of the page.
- getRobotsMeta() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getRobotsMeta(Reader, String, ContentType, Properties) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- getRobotsMeta(Reader, String, ContentType, Properties) - Method in interface com.norconex.collector.http.robot.IRobotsMetaProvider
-
Extracts Robots meta information for a page, if any.
- getRobotsMetaProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getRobotsTxt(HttpClient, String, String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
-
- getRobotsTxt(HttpClient, String, String) - Method in interface com.norconex.collector.http.robot.IRobotsTxtProvider
-
Gets robots.txt rules.
- getRobotsTxtProvider() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getScaleDimensions() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getScaleQuality() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getSchedules() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
-
- getSchemes() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets the schemes to be extracted.
- getScope() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets the delay scope.
- getScreenshotDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getScreenshotDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getScreenshotImageFormat() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the screenshot image format (jpg, png, gif, bmp, etc.).
- getScreenshotScaleDimensions() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the pixel dimensions we want the stored screenshot to have.
- getScreenshotScaleQuality() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the screenshot scaling quality to use when when storage
is "disk" or "inline".
- getScreenshotStorage() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the screenshot storage mechanisms.
- getScreenshotStorageDiskDir() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the directory where screenshots are saved when storage is "disk".
- getScreenshotStorageDiskField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the target document metadata field where to store the path
to thescreen shot image file when storage is "disk".
- getScreenshotStorageDiskStructure() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the screenshot directory structure to create when storage
is "disk".
- getScreenshotStorageInlineField() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets the target document metadata field where to store the inline
(Base64) screenshot image when storage is "inline".
- getScreenshotZoomFactor() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getScriptPath() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getSelectCrawlDataSQL(String) - Method in class com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataSerializer
-
- getSeparator() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Gets the segment separator pattern
- getSitemapChangeFreq() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the sitemap change frequency.
- getSitemapChangeFreq() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getSitemapLastMod() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the sitemap last modified date in milliseconds (EPOCH date).
- getSitemapLastMod() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getSitemapLocations() - Method in class com.norconex.collector.http.robot.RobotsTxt
-
- getSitemapLocations() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- getSitemapLocations() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Gets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
- getSitemapPaths() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Gets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
- getSitemapPriority() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the sitemap priority.
- getSitemapPriority() - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- getSitemapResolver() - Method in class com.norconex.collector.http.crawler.HttpCrawler
-
- getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- getSitemapResolver() - Method in class com.norconex.collector.http.pipeline.queue.HttpQueuePipelineContext
-
- getSitemapResolverFactory() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getSitemapSupport() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Gets the sitemap support strategy.
- getSitemapSupport(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
-
- getSocketTimeout() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the maximum period of inactivity between two consecutive data
packets, in milliseconds.
- getSSLProtocols() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets the supported SSL/TLS protocols.
- getStart() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
-
- getStartSitemapURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets sitemap URLs to be used as starting points for crawling.
- getStartURLs() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getStartURLsFiles() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the file paths of seed files containing URLs to be used as
"start URLs".
- getStartURLsProviders() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the providers of URLs used as starting points for crawling.
- getStatusCode() - Method in class com.norconex.collector.http.fetch.HttpFetchResponse
-
- getStatusCodes() - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Gets the status codes to listen for.
- getStorage() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getStorageDiskDir() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getStorageDiskField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getStorageDiskStructure() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getStorageInlineField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getStorageUrlField() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- getTag() - Method in class com.norconex.collector.http.url.Link
-
- getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Gets the directory where temporary sitemap files are written.
- getTempDir() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Gets the directory where sitemap files are temporary stored
before they are parsed.
- getText() - Method in class com.norconex.collector.http.url.Link
-
- getTimeRange() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
-
- getTitle() - Method in class com.norconex.collector.http.url.Link
-
- getUrl() - Method in class com.norconex.collector.http.processor.impl.ScaledImage
-
- getUrl() - Method in class com.norconex.collector.http.url.Link
-
- getURLCrawlScopeStrategy() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets the strategy to use to determine if a URL is in scope.
- getUrlNormalizer() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getUrlRoot() - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Gets the URL root (protocol + domain, e.g.
- getUserAgent() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- getValidExitCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- getValidStatusCodes() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- getValue() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- GOOD_REDIRECTS - Static variable in class com.norconex.collector.http.pipeline.importer.HttpImporterPipeline
-
- ICanonicalLinkDetector - Interface in com.norconex.collector.http.url
-
Detects and return any canonical URL found in documents, whether from
the HTTP headers (metadata), or from a page content (usually HTML).
- IDelayResolver - Interface in com.norconex.collector.http.delay
-
Resolves and creates intentional "delays" to increase document download
time intervals.
- IHttpClientFactory - Interface in com.norconex.collector.http.client
-
Create (and initializes) an Apache HttpClient
to be used for all
HTTP requests this crawler will make.
- IHttpDocumentFetcher - Interface in com.norconex.collector.http.fetch
-
Fetches the HTTP document and its metadata (HTTP Headers).
- IHttpDocumentProcessor - Interface in com.norconex.collector.http.doc
-
- IHttpDocumentProcessor - Interface in com.norconex.collector.http.processor
-
Custom processing (optional) performed on a document.
- IHttpMetadataFetcher - Interface in com.norconex.collector.http.fetch
-
Fetches the HTTP Header, typically via a HEAD request.
- ILinkExtractor - Interface in com.norconex.collector.http.url
-
Responsible for finding links in documents.
- ImageCache - Class in com.norconex.collector.http.processor.impl
-
Caches images.
- ImageCache(int, File) - Constructor for class com.norconex.collector.http.processor.impl.ImageCache
-
- initCrawlData(ICrawlData, ICrawlData, ImporterDocument) - Method in class com.norconex.collector.http.crawler.HttpCrawler
-
- IRecrawlableResolver - Interface in com.norconex.collector.http.recrawl
-
Indicates whether a document that was successfully crawled on a previous
crawling session should be recrawled or not.
- IRedirectURLProvider - Interface in com.norconex.collector.http.redirect
-
Responsible for providing a target absolute URL each time an HTTP redirect
is encountered when invoking a URL.
- IRobotsMetaProvider - Interface in com.norconex.collector.http.robot
-
Responsible for extracting robot information from a page.
- IRobotsTxtFilter - Interface in com.norconex.collector.http.robot
-
Holds a robots.txt rule.
- IRobotsTxtProvider - Interface in com.norconex.collector.http.robot
-
Given a URL, extract any "robots.txt" rules.
- isAuthPreemptive() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Gets whether to perform preemptive authentication
(valid for "basic" authentication method).
- isCaseSensitive() - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- isCaseSensitive() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor.RegexPair
-
- isCommentsEnabled() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Gets whether links should be extracted from HTML/XML comments.
- isCookiesDisabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Whether cookie support is disabled.
- isCurrentTimeInSchedule() - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule
-
- isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Gets whether character encoding is detected instead of relying on
HTTP response header.
- isDetectCharset() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Gets whether content type is detected instead of relying on
HTTP response header.
- isDetectContentType() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- isDisabled() - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
-
Whether this checksummer is disabled or not.
- isDisabled() - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
Whether this URL Normalizer is disabled or not.
- isDuplicate() - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- isEscalateErrors() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Gets whether errors should be thrown instead of logged.
- isEscalateErrors() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Gets whether errors should be thrown instead of logged.
- isExpectContinueEnabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Whether 'Expect: 100-continue' handshake is enabled.
- isHttpHeadFetchEnabled() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- isHttpHeadSuccessful() - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
Gets whether http headers were already fetched successfully.
- isIgnoreCanonicalLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Whether canonical links found in HTTP headers and in HTML files
<head> section should be ignored or processed.
- isIgnoreNofollow() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- isIgnoreNofollow() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- isIgnoreRobotsCrawlDelay() - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Gets whether to ignore crawl delays specified in a site robots.txt
file.
- isIgnoreRobotsMeta() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- isIgnoreRobotsTxt() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- isIgnoreSitemap() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Whether to ignore sitemap detection and resolving for URLs processed.
- isIncludeSubdomains() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether sub-domains are considered to be the same as a URL domain.
- isInScope(String, String) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
- ISitemapResolver - Interface in com.norconex.collector.http.sitemap
-
Given a URL root, resolve the corresponding sitemap(s), if any, and
only if it has not yet been resolved for a crawling session.
- ISitemapResolverFactory - Interface in com.norconex.collector.http.sitemap
-
- isKeepDownloads() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- isKeepMaxDepthLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether to keep (and extract) links on pages having reached
the configured maximum depth.
- isKeepOutOfScopeLinks() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- isKeepReferrerData() - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- isKeepReferrerData() - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- isLargest() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- isLenient() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- isLenient() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- isNofollow() - Method in class com.norconex.collector.http.robot.RobotsMeta
-
- isNoindex() - Method in class com.norconex.collector.http.robot.RobotsMeta
-
- isRecrawlable(PreviousCrawlData) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
- isRecrawlable(PreviousCrawlData) - Method in interface com.norconex.collector.http.recrawl.IRecrawlableResolver
-
Whether a document recrawlable or not.
- isRedirected(HttpRequest, HttpResponse, HttpContext) - Method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
- isResolved(String) - Method in class com.norconex.collector.http.sitemap.impl.SitemapStore
-
- isScaleStretch() - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- isScreenshotEnabled() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets whether to enable taking screenshot of crawled web pages.
- isScreenshotScaleStretch() - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Gets whether the screenshot should be stretch to to fill all
the scale dimensions.
- isSkipMetaFetcherOnBadStatus() - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Gets whether to skip metadata fetching activities instead of
rejecting a document on bad status.
- isStaleConnectionCheckDisabled() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- isStayOnDomain() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether the crawler should always stay on the same domain name as
the domain for each URL specified as a start URL.
- isStayOnPort() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether the crawler should always stay on the same port as
the port for each URL specified as a start URL.
- isStayOnProtocol() - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Gets whether the crawler should always stay on the same protocol as
the protocol for each URL specified as a start URL.
- IStartURLsProvider - Interface in com.norconex.collector.http.crawler
-
Provide starting URLs for crawling.
- isTrustAllSSLCertificates() - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Whether to trust all SSL certificates (affects only "https" connections).
- IURLNormalizer - Interface in com.norconex.collector.http.url
-
Responsible for normalizing URLs.
- saveChecksummerToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
-
- saveCollectorConfigToXML(Writer) - Method in class com.norconex.collector.http.HttpCollectorConfig
-
- saveCrawlerConfigToXML(Writer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Saves explicit configuration of delays to XML.
- saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
-
- saveDelaysToXML(EnhancedXMLStreamWriter) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- saveToXML(Writer) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
-
- ScaledImage - Class in com.norconex.collector.http.processor.impl
-
- ScaledImage(String, Dimension, BufferedImage) - Constructor for class com.norconex.collector.http.processor.impl.ScaledImage
-
- SCOPE_CRAWLER - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
- SCOPE_SITE - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
- SCOPE_THREAD - Static variable in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
- SegmentCountURLFilter - Class in com.norconex.collector.http.filter.impl
-
Filters URL based based on the number of URL segments.
- SegmentCountURLFilter() - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int, OnMatch) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- SegmentCountURLFilter(int, OnMatch, boolean) - Constructor for class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
Constructor.
- setApplyTo(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- setApplyToContentTypePattern(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- setApplyToContentTypePattern(String) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
-
- setApplyToReferencePattern(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
- setApplyToReferencePattern(String) - Method in class com.norconex.collector.http.url.impl.XMLFeedLinkExtractor
-
- setAuthDomain(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the NTLM authentication domain
- setAuthFormCharset(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the authentication form character set for the form field values.
- setAuthFormParam(String, String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets an authentication form parameter (equivalent to "input" or other
fields in HTML forms).
- setAuthHostname(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the host name for the current authentication scope.
- setAuthMethod(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the authentication method.
- setAuthPassword(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the authentication password.
- setAuthPasswordField(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the name of the HTML field where the password is set.
- setAuthPasswordKey(EncryptionKey) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the authentication password encryption key.
- setAuthPort(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the port for the current authentication scope.
- setAuthPreemptive(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets whether to perform preemptive authentication
(valid for "basic" authentication method).
- setAuthRealm(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the realm name for the current authentication scope.
- setAuthURL(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the URL for "form" authentication.
- setAuthUsername(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the username.
- setAuthUsernameField(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the name of the HTML field where the username is set.
- setAuthWorkstation(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the NTLM authentication workstation name.
- setCanonicalLinkDetector(ICanonicalLinkDetector) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the canonical link detector.
- setCaseSensitive(boolean) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- setCharset(String) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the character set of pages on which link extraction is performed.
- setCharset(String) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
Sets the character set of pages on which link extraction is performed.
- setCommentsEnabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets whether links should be extracted from HTML/XML comments.
- setConnectionCharset(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the connection character set.
- setConnectionRequestTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the timeout when requesting a connection, in milliseconds.
- setConnectionTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the connection timeout until a connection is established,
in milliseconds.
- setContentType(ContentType) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setContentTypePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.GenericCanonicalLinkDetector
-
- setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- setContentTypes(ContentType...) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- setCookiesDisabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets whether cookie support is disabled.
- setCookieSpec(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- setCount(int) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- setCrawlDate(Date) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setDefaultDelay(long) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets the default delay in milliseconds.
- setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern>) - Method in class com.norconex.collector.http.delay.impl.ReferenceDelayResolver
-
- setDelayResolver(IDelayResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setDepth(int) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets the URL depth.
- setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Sets whether character encoding is detected instead of relying on
HTTP response header.
- setDetectCharset(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Sets whether content type is detected instead of relying on
HTTP response header.
- setDetectContentType(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setDisabled(boolean) - Method in class com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer
-
Sets whether this checksummer is disabled or not.
- setDisabled(boolean) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
Sets whether this URL Normalizer is disabled or not.
- setDocumentFetcher(IHttpDocumentFetcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setDomSelector(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setDuplicate(boolean) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- setEscalateErrors(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Sets whether errors should be thrown instead of logged.
- setEscalateErrors(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Sets whether errors should be thrown instead of logged.
- setExePath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setExpectContinueEnabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets whether 'Expect: 100-continue' handshake is enabled.
- setExtractBetweens(GenericLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the patterns delimiting the portions of a document to be considered
for link extraction.
- setExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the selectors matching the portions of a document to be considered
for link extraction.
- setFallbackCharset(String) - Method in class com.norconex.collector.http.redirect.impl.GenericRedirectURLProvider
-
- setFileNamePrefix(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets the generated report file name prefix.
- setFromDate(long) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Sets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
- setFromDate(long) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Sets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setHeadersPrefix(String) - Method in class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- setHttpClientFactory(IHttpClientFactory) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setHttpHeadSuccessful(boolean) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
Sets whether http headers were already fetched successfully.
- setIgnoreCanonicalLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether canonical links found in HTTP headers and in HTML files
<head> section should be ignored or processed.
- setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- setIgnoreNofollow(boolean) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- setIgnoreRobotsCrawlDelay(boolean) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets whether to ignore crawl delays specified in a site robots.txt
file.
- setIgnoreRobotsMeta(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setIgnoreRobotsTxt(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setIgnoreSitemap(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to ignore sitemap detection and resolving for URLs
processed.
- setImage(ScaledImage) - Method in class com.norconex.collector.http.processor.impl.ImageCache
-
- setImageCacheDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setImageCacheSize(int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setImageFormat(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setIncludeSubdomains(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether sub-domains are considered to be the same as a URL domain.
- setKeepDownloads(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setKeepMaxDepthLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to keep (and extract) links on pages having reached
the configured maximum depth.
- setKeepOutOfScopeLinks(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setKeepReferrerData(boolean) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
- setKeepReferrerData(boolean) - Method in class com.norconex.collector.http.url.impl.TikaLinkExtractor
-
- setLargest(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- setLenient(boolean) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- setLinkExtractors(ILinkExtractor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setLocalAddress(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the local address, which may be useful when working with multiple
network interfaces.
- setMaxConnectionIdleTime(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the period of time in milliseconds after which to evict idle
connections from the connection pool.
- setMaxConnectionInactiveTime(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the period of time in milliseconds a connection must be inactive
to be checked in case it became stalled.
- setMaxConnections(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets maximum number of connections that can be created.
- setMaxConnectionsPerRoute(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the maximum number of connections that can be used per route.
- setMaxDepth(int) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setMaxRedirects(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the maximum number of redirects to be followed.
- setMaxURLLength(int) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the maximum supported URL length.
- setMaxURLLength(int) - Method in class com.norconex.collector.http.url.impl.RegexLinkExtractor
-
Sets the maximum supported URL length.
- setMetadataChecksummer(IMetadataChecksummer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setMetadataFetcher(IHttpMetadataFetcher) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setMinDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setMinDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setMinFrequencies(GenericRecrawlableResolver.MinFrequency...) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
- setNoExtractBetweens(GenericLinkExtractor.RegexPair...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the patterns delimiting the portions of a document to be excluded
from link extraction.
- setNoExtractSelectors(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the selectors matching the portions of a document to be excluded
from link extraction.
- setNofollowPatterns(List<String>) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the patterns of references for which link extraction is disabled.
- setNormalizations(GenericURLNormalizer.Normalization...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
Sets HTTP status codes to be considered as "Not found" state.
- setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
Sets HTTP status codes to be considered as "Not found" state.
- setNotFoundStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setOptions(String...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setOriginalReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setOutputDir(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets the local directory where this listener report will be written.
- setPageContentTypePattern(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setPattern(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- setPort(int) - Method in class com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener
-
- setPostImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setPreImportProcessors(IHttpDocumentProcessor...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setProxyHost(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy host.
- setProxyPassword(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy password.
- setProxyPasswordKey(EncryptionKey) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy password encryption key.
- setProxyPort(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy port.
- setProxyRealm(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy realm
- setProxyScheme(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy scheme.
- setProxyUsername(String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the proxy username
- setRecrawlableResolver(IRecrawlableResolver) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the recrawlable resolver.
- setRedirectTrail(String...) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets the trail of URLs that were redirected up to this one.
- setRedirectURL(String) - Static method in class com.norconex.collector.http.redirect.RedirectStrategyWrapper
-
Sets the redirect URL.
- setRedirectURLProvider(IRedirectURLProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the redirect URL provider
- setReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setReference(String) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setReferencedUrls(String...) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets URLs referenced by this one.
- setReferencePattern(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setReferrer(String) - Method in class com.norconex.collector.http.url.Link
-
- setReferrerLinkTag(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setReferrerLinkText(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setReferrerLinkTitle(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setReferrerReference(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
- setRenderWaitTime(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setReplaces(GenericURLNormalizer.Replace...) - Method in class com.norconex.collector.http.url.impl.GenericURLNormalizer
-
- setRequestHeader(String, String) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets a default HTTP request header every HTTP connection should have.
- setResourceTimeout(int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the milliseconds timeout after which any resource requested will
stop trying and proceed with other parts of the page.
- setRobotsMeta(RobotsMeta) - Method in class com.norconex.collector.http.pipeline.importer.HttpImporterPipelineContext
-
- setRobotsMetaProvider(IRobotsMetaProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setRobotsTxtProvider(IRobotsTxtProvider) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setScaleDimensions(int, int) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setScaleDimensions(Dimension) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setScaleQuality(FeaturedImageProcessor.Quality) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setScaleStretch(boolean) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setSchedules(List<GenericDelayResolver.DelaySchedule>) - Method in class com.norconex.collector.http.delay.impl.GenericDelayResolver
-
- setSchemes(String...) - Method in class com.norconex.collector.http.url.impl.GenericLinkExtractor
-
Sets the schemes to be extracted.
- setScope(String) - Method in class com.norconex.collector.http.delay.impl.AbstractDelayResolver
-
Sets the delay scope.
- setScreenshotDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setScreenshotDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setScreenshotDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setScreenshotEnabled(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets whether to enable taking screenshot of crawled web pages.
- setScreenshotImageFormat(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the screenshot image format (jpg, png, gif, bmp, etc.).
- setScreenshotScaleDimensions(Dimension) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the pixel dimensions we want the stored screenshot to have.
- setScreenshotScaleDimensions(int, int) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the pixel dimensions we want the stored screenshot to have.
- setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the screenshot scaling quality to use when when storage
is "disk" or "inline".
- setScreenshotScaleStretch(boolean) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets whether the screenshot should be stretch to to fill all
the scale dimensions.
- setScreenshotStorage(PhantomJSDocumentFetcher.Storage...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the screenshot storage mechanisms.
- setScreenshotStorageDiskDir(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the directory where screenshots are saved when storage is "disk".
- setScreenshotStorageDiskField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the target document metadata field where to store the path
to thescreen shot image file when storage is "disk".
- setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the screenshot directory structure to create when storage
is "disk".
- setScreenshotStorageInlineField(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
Sets the target document metadata field where to store the inline
(Base64) screenshot image when storage is "inline".
- setScreenshotZoomFactor(float) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setScriptPath(String) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setSeparator(String) - Method in class com.norconex.collector.http.filter.impl.SegmentCountURLFilter
-
- setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets the sitemap change frequency.
- setSitemapChangeFreq(String) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setSitemapLastMod(Long) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets the sitemap last modified date in milliseconds (EPOCH date).
- setSitemapLastMod(Long) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setSitemapLocations(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- setSitemapLocations(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Sets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
- setSitemapPaths(String...) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Sets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
- setSitemapPriority(Float) - Method in class com.norconex.collector.http.data.HttpCrawlData
-
Sets the sitemap priority.
- setSitemapPriority(Float) - Method in class com.norconex.collector.http.recrawl.PreviousCrawlData
-
- setSitemapResolverFactory(ISitemapResolverFactory) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setSitemapSupport(GenericRecrawlableResolver.SitemapSupport) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver
-
Sets the sitemap support strategy.
- setSkipMetaFetcherOnBadStatus(boolean) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets whether to skip metadata fetching activities instead of
rejecting a document on bad status.
- setSocketTimeout(int) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the maximum period of inactivity between two consecutive data
packets, in milliseconds.
- setSSLProtocols(String...) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets the supported SSL/TLS protocols, such as SSLv3, TLSv1, TLSv1.1,
and TLSv1.2.
- setStaleConnectionCheckDisabled(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
- setStartSitemapURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the sitemap URLs used as starting points for crawling.
- setStartURLs(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setStartURLsFiles(String...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the file paths of seed files containing URLs to be used as
"start URLs".
- setStartURLsProviders(IStartURLsProvider...) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the providers of URLs used as starting points for crawling.
- setStatusCodes(String) - Method in class com.norconex.collector.http.crawler.event.impl.URLStatusCrawlerEventListener
-
Sets a coma-separated list of status codes to listen to.
- setStayOnDomain(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same domain name as
the domain for each URL specified as a start URL.
- setStayOnPort(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same port as
the port for each URL specified as a start URL.
- setStayOnProtocol(boolean) - Method in class com.norconex.collector.http.crawler.URLCrawlScopeStrategy
-
Sets whether the crawler should always stay on the same protocol as
the protocol for each URL specified as a start URL.
- setStorage(FeaturedImageProcessor.Storage...) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setStorageDiskDir(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setStorageDiskField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setStorageDiskStructure(FeaturedImageProcessor.StorageDiskStructure) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setStorageInlineField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setStorageUrlField(String) - Method in class com.norconex.collector.http.processor.impl.FeaturedImageProcessor
-
- setTag(String) - Method in class com.norconex.collector.http.url.Link
-
- setTempDir(File) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
Sets the directory where temporary sitemap files are written.
- setTempDir(File) - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
Sets the temporary directory where sitemap files are temporary stored
before they are parsed.
- setText(String) - Method in class com.norconex.collector.http.url.Link
-
- setTitle(String) - Method in class com.norconex.collector.http.url.Link
-
- setTrustAllSSLCertificates(boolean) - Method in class com.norconex.collector.http.client.impl.GenericHttpClientFactory
-
Sets whether to trust all SSL certificate.
- setUrlCrawlScopeStrategy(URLCrawlScopeStrategy) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
Sets the strategy to use to determine if a URL is in scope.
- setUrlNormalizer(IURLNormalizer) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setUserAgent(String) - Method in class com.norconex.collector.http.crawler.HttpCrawlerConfig
-
- setValidExitCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericDocumentFetcher
-
- setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.GenericMetadataFetcher
-
- setValidStatusCodes(int...) - Method in class com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- setValue(String) - Method in class com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.MinFrequency
-
- SiteDelay - Class in com.norconex.collector.http.delay.impl
-
- SiteDelay() - Constructor for class com.norconex.collector.http.delay.impl.SiteDelay
-
- SitemapChangeFrequency - Enum in com.norconex.collector.http.sitemap
-
- SitemapStore - Class in com.norconex.collector.http.sitemap.impl
-
- SitemapStore(HttpCrawlerConfig, boolean) - Constructor for class com.norconex.collector.http.sitemap.impl.SitemapStore
-
- SitemapURLAdder - Class in com.norconex.collector.http.sitemap
-
Represents a queue of sitemap URLs.
- SitemapURLAdder() - Constructor for class com.norconex.collector.http.sitemap.SitemapURLAdder
-
- StandardRobotsMetaProvider - Class in com.norconex.collector.http.robot.impl
-
- StandardRobotsMetaProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsMetaProvider
-
- StandardRobotsTxtProvider - Class in com.norconex.collector.http.robot.impl
-
- StandardRobotsTxtProvider() - Constructor for class com.norconex.collector.http.robot.impl.StandardRobotsTxtProvider
-
- StandardSitemapResolver - Class in com.norconex.collector.http.sitemap.impl
-
- StandardSitemapResolver(File, SitemapStore) - Constructor for class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- StandardSitemapResolverFactory - Class in com.norconex.collector.http.sitemap.impl
-
- StandardSitemapResolverFactory() - Constructor for class com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory
-
- stop(IJobStatus, JobSuite) - Method in class com.norconex.collector.http.crawler.HttpCrawler
-
- stop() - Method in class com.norconex.collector.http.sitemap.impl.StandardSitemapResolver
-
- stop() - Method in interface com.norconex.collector.http.sitemap.ISitemapResolver
-
Stops any ongoing sitemap resolution.
- valueOf(String) - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.norconex.collector.http.delay.impl.GenericDelayResolver.DelaySchedule.DOW
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Quality
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.Storage
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.StorageDiskStructure
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Quality
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.Storage
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.processor.impl.FeaturedImageProcessor.StorageDiskStructure
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.recrawl.impl.GenericRecrawlableResolver.SitemapSupport
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.sitemap.SitemapChangeFrequency
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum com.norconex.collector.http.url.impl.GenericURLNormalizer.Normalization
-
Returns an array containing the constants of this enum type, in
the order they are declared.