public class StandardSitemapResolverFactory extends Object implements ISitemapResolverFactory, IXMLConfigurable
Factory used to created StandardSitemapResolver
instances.
Refer to StandardSitemapResolver
for resolution logic.
<sitemapResolverFactory ignore="[false|true]" lenient="[false|true]" escalateErrors="[false|true]" fromDate="(Optional EPOCH date, in milliseconds)" class="com.norconex.collector.http.sitemap.impl.StandardSitemapResolverFactory"> <tempDir>(where to store temp files)</tempDir> <path> (Optional path relative to URL root for a sitemap. Use a single empty "path" tag to rely instead on any sitemaps specified as start URLs or defined in robots.txt, if enabled. Not specifying any path tags falls back to trying to locate sitemaps using default paths.) </path> (... repeat path tag as needed ...) </sitemapResolverFactory>
The following ignores sitemap files present on web sites.
<sitemapResolverFactory ignore="true"/>
StandardSitemapResolver
Constructor and Description |
---|
StandardSitemapResolverFactory() |
Modifier and Type | Method and Description |
---|---|
ISitemapResolver |
createSitemapResolver(HttpCrawlerConfig config,
boolean resume) |
boolean |
equals(Object other) |
long |
getFromDate()
Gets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
|
String[] |
getSitemapLocations()
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.getStartSitemapURLs() |
String[] |
getSitemapPaths()
Gets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
File |
getTempDir()
Gets the directory where sitemap files are temporary stored
before they are parsed.
|
int |
hashCode() |
boolean |
isEscalateErrors()
Gets whether errors should be thrown instead of logged.
|
boolean |
isLenient() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setEscalateErrors(boolean escalateErrors)
Sets whether errors should be thrown instead of logged.
|
void |
setFromDate(long fromDate)
Sets the minimum EPOCH date (in milliseconds) a sitemap entry
should have to be considered.
|
void |
setLenient(boolean lenient) |
void |
setSitemapLocations(String... sitemapLocations)
Deprecated.
Since 2.3.0, use
HttpCrawlerConfig.setStartSitemapURLs(String[]) |
void |
setSitemapPaths(String... sitemapPaths)
Sets the URL paths, relative to the URL root, from which to try
locate and resolve sitemaps.
|
void |
setTempDir(File tempDir)
Sets the temporary directory where sitemap files are temporary stored
before they are parsed.
|
String |
toString() |
public ISitemapResolver createSitemapResolver(HttpCrawlerConfig config, boolean resume)
createSitemapResolver
in interface ISitemapResolverFactory
public String[] getSitemapPaths()
public void setSitemapPaths(String... sitemapPaths)
sitemapPaths
- sitemap paths.@Deprecated public String[] getSitemapLocations()
HttpCrawlerConfig.getStartSitemapURLs()
@Deprecated public void setSitemapLocations(String... sitemapLocations)
HttpCrawlerConfig.setStartSitemapURLs(String[])
sitemapLocations
- sitemap locationspublic boolean isLenient()
public void setLenient(boolean lenient)
public long getFromDate()
public void setFromDate(long fromDate)
fromDate
- from datepublic boolean isEscalateErrors()
true
if throwing errorspublic void setEscalateErrors(boolean escalateErrors)
escalateErrors
- true
if throwing errorspublic File getTempDir()
null
(default), temporary
files are created directly under AbstractCrawlerConfig.getWorkDir()
.
the crawler working directory is also undefined, it will use the
system temporary directory, as returned by
FileUtils.getTempDirectory()
.public void setTempDir(File tempDir)
tempDir
- directory where temporary files are writtenpublic void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2009–2021 Norconex Inc.. All rights reserved.