public class GenericDelayResolver extends AbstractDelayResolver
Default implementation for creating voluntary delays between URL downloads. There are a few ways the actual delay value can be defined (in order):
In a delay schedule, the days of weeks are spelled out (in English): Monday, Tuesday, etc. Time ranges are using the 24h format.
One of these following scope dictates how the delay is applied, listed in order from the best behaved to the least.
As of 2.7.0, XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
<delay class="com.norconex.collector.http.delay.impl.GenericDelayResolver" default="(milliseconds)" ignoreRobotsCrawlDelay="[false|true]" scope="[crawler|site|thread]" > <schedule dayOfWeek="from (week day) to (week day)" dayOfMonth="from [1-31] to [1-31]" time="from (HH:mm) to (HH:mm)"> (delay in milliseconds) </schedule> (... repeat schedule tag as needed ...) </delay>
The following set the minimum delay between each document download on a given site to 5 seconds, no matter what the crawler robots.txt may say, except on weekend, where it is more agressive (1 second).
<delay class="com.norconex.collector.http.delay.impl.GenericDelayResolver" default="5 seconds" ignoreRobotsCrawlDelay="true" scope="site" > <schedule dayOfWeek="from Saturday to Sunday">1 second</schedule> </delay>
Modifier and Type | Class and Description |
---|---|
static class |
GenericDelayResolver.DelaySchedule |
DEFAULT_DELAY, SCOPE_CRAWLER, SCOPE_SITE, SCOPE_THREAD
Constructor and Description |
---|
GenericDelayResolver() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
List<GenericDelayResolver.DelaySchedule> |
getSchedules() |
int |
hashCode() |
protected void |
loadDelaysFromXML(XMLConfiguration xml)
Loads explicit configuration of delays form XML.
|
protected long |
resolveExplicitDelay(String url)
Resolves explicitly specified delay, in milliseconds.
|
protected void |
saveDelaysToXML(EnhancedXMLStreamWriter writer)
Saves explicit configuration of delays to XML.
|
void |
setSchedules(List<GenericDelayResolver.DelaySchedule> schedules) |
String |
toString() |
delay, getDefaultDelay, getScope, isIgnoreRobotsCrawlDelay, loadFromXML, saveToXML, setDefaultDelay, setIgnoreRobotsCrawlDelay, setScope
public List<GenericDelayResolver.DelaySchedule> getSchedules()
public void setSchedules(List<GenericDelayResolver.DelaySchedule> schedules)
protected long resolveExplicitDelay(String url)
AbstractDelayResolver
resolveExplicitDelay
in class AbstractDelayResolver
url
- URL for which to resolve delayprotected void loadDelaysFromXML(XMLConfiguration xml) throws IOException
AbstractDelayResolver
loadDelaysFromXML
in class AbstractDelayResolver
xml
- configurationIOException
- problem loading delaysprotected void saveDelaysToXML(EnhancedXMLStreamWriter writer) throws IOException
AbstractDelayResolver
saveDelaysToXML
in class AbstractDelayResolver
writer
- a writerIOException
- problem saving delayspublic boolean equals(Object other)
equals
in class AbstractDelayResolver
public int hashCode()
hashCode
in class AbstractDelayResolver
public String toString()
toString
in class AbstractDelayResolver
Copyright © 2009–2021 Norconex Inc.. All rights reserved.