public interface ICanonicalLinkDetector
Detects and return any canonical URL found in documents, whether from the HTTP headers (metadata), or from a page content (usually HTML). Documents having a canonical URL reference in them are rejected in favor of the document represented by the canonical URL.
When a IHttpMetadataFetcher
is used, a page won't be downloaded
if a canonical link is found in the HTTP headers (saving bandwidth and
processing). If not used, or if no canonical link was found, an attempt
will be made against the HTTP headers obtained (if any) just after fetching
a document. If no canonical link was found there, then the content
is evaluated.
A canonical link found to be the same as the current page reference is ignored.
Modifier and Type | Method and Description |
---|---|
String |
detectFromContent(String reference,
InputStream is,
ContentType contentType)
Detects from a document content the presence of a canonical URL.
|
String |
detectFromMetadata(String reference,
HttpMetadata metadata)
Detects from metadata gathered so far, which when invoked, is
normally the HTTP header values.
|
String detectFromMetadata(String reference, HttpMetadata metadata)
reference
- document referencemetadata
- metadata object containing HTTP headersnull
if none is found.String detectFromContent(String reference, InputStream is, ContentType contentType) throws IOException
reference
- document referenceis
- the document content input streamcontentType
- the document content typenull
if none is found.IOException
- problem reading contentCopyright © 2009–2021 Norconex Inc.. All rights reserved.