public class StandardSitemapResolverFactory extends Object implements ISitemapResolverFactory, IXMLConfigurable
Factory used to created StandardSitemapResolver instances.
Refer to StandardSitemapResolver for resolution logic.
<sitemapResolverFactory
ignore="[false|true]"
lenient="[false|true]"
escalateErrors="[false|true]"
fromDate="(Optional EPOCH date, in milliseconds)"
class="com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory">
<tempDir>(where to store temp files)</tempDir>
<path>
(Optional path relative to URL root for a sitemap. Use a single empty
"path" tag to rely instead on any sitemaps specified as start URLs or
defined in robots.txt, if enabled. Not specifying any path tags
falls back to trying to locate sitemaps using default paths.)
</path>
(... repeat path tag as needed ...)
</sitemapResolverFactory>
The following ignores sitemap files present on web sites.
<sitemapResolverFactory ignore="true"/>
StandardSitemapResolver| Constructor and Description |
|---|
StandardSitemapResolverFactory() |
| Modifier and Type | Method and Description |
|---|---|
ISitemapResolver |
createSitemapResolver(HttpCrawlerConfig config,
boolean resume) |
boolean |
equals(Object other) |
long |
getFromDate()
Gets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
|
String[] |
getSitemapLocations()
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.getStartSitemapURLs() |
String[] |
getSitemapPaths()
Gets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
File |
getTempDir()
Gets the directory where sitemap files are temporary stored
before they are parsed.
|
int |
hashCode() |
boolean |
isEscalateErrors()
Gets whether errors should be thrown instead of logged.
|
boolean |
isLenient() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setEscalateErrors(boolean escalateErrors)
Sets whether errors should be thrown instead of logged.
|
void |
setFromDate(long fromDate)
Sets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
|
void |
setLenient(boolean lenient) |
void |
setSitemapLocations(String... sitemapLocations)
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.setStartSitemapURLs(String[]) |
void |
setSitemapPaths(String... sitemapPaths)
Sets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
void |
setTempDir(File tempDir)
Sets the temporary directory where sitemap files are temporary stored
before they are parsed.
|
String |
toString() |
public ISitemapResolver createSitemapResolver(HttpCrawlerConfig config, boolean resume)
createSitemapResolver in interface ISitemapResolverFactorypublic String[] getSitemapPaths()
public void setSitemapPaths(String... sitemapPaths)
sitemapPaths - sitemap paths.@Deprecated public String[] getSitemapLocations()
HttpCrawlerConfig.getStartSitemapURLs()@Deprecated public void setSitemapLocations(String... sitemapLocations)
HttpCrawlerConfig.setStartSitemapURLs(String[])sitemapLocations - sitemap locationspublic boolean isLenient()
public void setLenient(boolean lenient)
public long getFromDate()
public void setFromDate(long fromDate)
fromDate - from datepublic boolean isEscalateErrors()
true if throwing errorspublic void setEscalateErrors(boolean escalateErrors)
escalateErrors - true if throwing errorspublic File getTempDir()
null (default), temporary
files are created directly under AbstractCrawlerConfig.getWorkDir().
the crawler working directory is also undefined, it will use the
system temporary directory, as returned by
FileUtils.getTempDirectory().public void setTempDir(File tempDir)
tempDir - directory where temporary files are writtenpublic void loadFromXML(Reader in) throws IOException
loadFromXML in interface IXMLConfigurableIOExceptionpublic void saveToXML(Writer out) throws IOException
saveToXML in interface IXMLConfigurableIOExceptionCopyright © 2009–2021 Norconex Inc.. All rights reserved.