Package com.norconex.collector.http.doc
Class HttpDocInfo
java.lang.Object
com.norconex.importer.doc.DocInfo
com.norconex.collector.core.doc.CrawlDocInfo
com.norconex.collector.http.doc.HttpDocInfo
- All Implemented Interfaces:
Serializable
A URL being crawled holding relevant crawl information.
- Author:
- Pascal Essiembre
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.norconex.collector.core.doc.CrawlDocInfo
CrawlDocInfo.Stage -
Constructor Summary
ConstructorsConstructorDescriptionHttpDocInfo(DocInfo docDetails) Copy constructor.HttpDocInfo(String reference) HttpDocInfo(String url, int depth) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddRedirectToTrail(String url) Adds a redirect URL to the trail of URLs that were redirected so far.booleanintgetDepth()Gets the URL depth.getEtag()Gets the HTTP ETag.Gets the immediate target of a redirect.Gets the trail of URLs that were redirected up to this one.Gets URLs referenced by this one.Gets the sitemap change frequency.Gets the sitemap last modified date.Gets the sitemap priority.Gets the URL root (protocol + domain, e.g. http://www.host.com).inthashCode()final voidsetDepth(int depth) Sets the URL depth.voidSets the HTTP ETag.voidsetOriginalReference(String originalReference) voidsetRedirectTarget(String redirectTarget) Sets the immediate target of a redirect.voidsetRedirectTrail(List<String> redirectTrail) Sets the trail of URLs that were redirected up to this one.final voidsetReference(String url) voidsetReferencedUrls(List<String> referencedUrls) Sets URLs referenced by this one.voidsetReferrerLinkMetadata(String referrerLinkMetadata) voidsetReferrerReference(String referrerReference) voidsetSitemapChangeFreq(String sitemapChangeFreq) Sets the sitemap change frequency.voidsetSitemapLastMod(ZonedDateTime sitemapLastMod) Sets the sitemap last modified date.voidsetSitemapPriority(Float sitemapPriority) Sets the sitemap priority.toString()Methods inherited from class com.norconex.collector.core.doc.CrawlDocInfo
getContentChecksum, getCrawlDate, getMetaChecksum, getParentRootReference, getState, setContentChecksum, setCrawlDate, setMetaChecksum, setParentRootReference, setStateMethods inherited from class com.norconex.importer.doc.DocInfo
addEmbeddedParentReference, copyFrom, copyTo, getContentEncoding, getContentType, getEmbeddedParentReferences, getReference, setContentEncoding, setContentType, setEmbeddedParentReferences
-
Constructor Details
-
HttpDocInfo
public HttpDocInfo() -
HttpDocInfo
-
HttpDocInfo
Constructor.- Parameters:
url- URL being crawleddepth- URL depth
-
HttpDocInfo
Copy constructor.- Parameters:
docDetails- document details to copy
-
-
Method Details
-
getEtag
Gets the HTTP ETag.- Returns:
- etag
- Since:
- 3.0.0
-
setEtag
Sets the HTTP ETag.- Parameters:
etag- the ETag- Since:
- 3.0.0
-
getOriginalReference
-
setOriginalReference
-
getDepth
public int getDepth()Gets the URL depth.- Returns:
- URL depth
-
getSitemapLastMod
Gets the sitemap last modified date.- Returns:
- last modified date
-
setSitemapLastMod
Sets the sitemap last modified date.- Parameters:
sitemapLastMod- last modified date
-
getSitemapChangeFreq
Gets the sitemap change frequency.- Returns:
- sitemap change frequency
-
setSitemapChangeFreq
Sets the sitemap change frequency.- Parameters:
sitemapChangeFreq- sitemap change frequency
-
getSitemapPriority
Gets the sitemap priority.- Returns:
- sitemap priority
-
setSitemapPriority
Sets the sitemap priority.- Parameters:
sitemapPriority- sitemap priority
-
setDepth
public final void setDepth(int depth) Sets the URL depth.- Parameters:
depth- URL depth
-
getReferrerReference
-
setReferrerReference
-
getReferrerLinkMetadata
-
setReferrerLinkMetadata
-
setReference
- Overrides:
setReferencein classDocInfo
-
getUrlRoot
Gets the URL root (protocol + domain, e.g. http://www.host.com).- Returns:
- URL root
-
getReferencedUrls
Gets URLs referenced by this one.- Returns:
- URLs referenced by this one (never
null). - Since:
- 2.6.0
-
setReferencedUrls
Sets URLs referenced by this one.- Parameters:
referencedUrls- referenced URLs- Since:
- 3.0.0
-
getRedirectTrail
Gets the trail of URLs that were redirected up to this one.- Returns:
- URL redirection trail to this one (never
null). - Since:
- 2.8.0
-
setRedirectTrail
Sets the trail of URLs that were redirected up to this one.- Parameters:
redirectTrail- URL redirection trail to this one- Since:
- 3.0.0
-
addRedirectToTrail
Adds a redirect URL to the trail of URLs that were redirected so far.- Parameters:
url- URL to add- Since:
- 3.0.0
-
getRedirectTarget
Gets the immediate target of a redirect.- Returns:
- redirect target or
null - Since:
- 3.1.0
-
setRedirectTarget
Sets the immediate target of a redirect.- Parameters:
redirectTarget- redirect target- Since:
- 3.1.0
-
equals
- Overrides:
equalsin classCrawlDocInfo
-
hashCode
public int hashCode()- Overrides:
hashCodein classCrawlDocInfo
-
toString
- Overrides:
toStringin classCrawlDocInfo
-