public abstract class AbstractDelayResolver extends Object implements IDelayResolver, IXMLConfigurable
Base implementation for creating voluntary delays between URL downloads. This base class offers a few ways the actual delay value can be defined (in order):
One of these following scope dictates how the delay is applied, listed in order from the best behaved to the least.
The following should be shared across concrete implementations (which can add more configurable attributes and tags).
<delay class="(implementing class)" default="(milliseconds)" ignoreRobotsCrawlDelay="[false|true]" scope="[crawler|site|thread]" > </delay>
Modifier and Type | Field and Description |
---|---|
static long |
DEFAULT_DELAY
Default delay is 3 seconds.
|
static String |
SCOPE_CRAWLER |
static String |
SCOPE_SITE |
static String |
SCOPE_THREAD |
Constructor and Description |
---|
AbstractDelayResolver() |
Modifier and Type | Method and Description |
---|---|
void |
delay(RobotsTxt robotsTxt,
String url)
Delay crawling activities (if applicable).
|
boolean |
equals(Object other) |
long |
getDefaultDelay()
Gets the default delay in milliseconds.
|
String |
getScope()
Gets the delay scope.
|
int |
hashCode() |
boolean |
isIgnoreRobotsCrawlDelay()
Gets whether to ignore crawl delays specified in a site robots.txt
file.
|
protected void |
loadDelaysFromXML(XMLConfiguration xml)
Loads explicit configuration of delays form XML.
|
void |
loadFromXML(Reader in) |
protected abstract long |
resolveExplicitDelay(String url)
Resolves explicitly specified delay, in milliseconds.
|
protected void |
saveDelaysToXML(EnhancedXMLStreamWriter writer)
Saves explicit configuration of delays to XML.
|
void |
saveToXML(Writer out) |
void |
setDefaultDelay(long defaultDelay)
Sets the default delay in milliseconds.
|
void |
setIgnoreRobotsCrawlDelay(boolean ignoreRobotsCrawlDelay)
Sets whether to ignore crawl delays specified in a site robots.txt
file.
|
void |
setScope(String scope)
Sets the delay scope.
|
String |
toString() |
public static final String SCOPE_CRAWLER
public static final String SCOPE_SITE
public static final String SCOPE_THREAD
public static final long DEFAULT_DELAY
public void delay(RobotsTxt robotsTxt, String url)
IDelayResolver
delay
in interface IDelayResolver
robotsTxt
- robots.txt instance (if applicable)url
- the URL being crawledpublic long getDefaultDelay()
public void setDefaultDelay(long defaultDelay)
defaultDelay
- default deleaypublic boolean isIgnoreRobotsCrawlDelay()
true
if ignoring robots.txt crawl delaypublic void setIgnoreRobotsCrawlDelay(boolean ignoreRobotsCrawlDelay)
ignoreRobotsCrawlDelay
- true
if ignoring
robots.txt crawl delaypublic String getScope()
public void setScope(String scope)
scope
- one of "crawler", "site", or "thread".protected abstract long resolveExplicitDelay(String url)
url
- URL for which to resolve delaypublic final void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
protected void loadDelaysFromXML(XMLConfiguration xml) throws IOException
xml
- configurationIOException
- problem loading delayspublic final void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
protected void saveDelaysToXML(EnhancedXMLStreamWriter writer) throws IOException
writer
- a writerIOException
- problem saving delaysCopyright © 2009–2021 Norconex Inc.. All rights reserved.